User Tools

Site Tools


programming:python:clcheck

This is an old revision of the document!


Craigslist Search and Notify Script

What is it?

A friend of mine asked me recently if I knew of a script that would perform a search on craigslist.org and send him an email when there were new results to his search. He's always looking for oddball stuff and by the time he finds it on craigslist and emails/calls the person, the item is already gone.

Since this sounded like something that would actually be quite useful for myself (and others) so I decided to hammer something out.

The Script and Config File

Setting this up is a piece of cake. All you have to do is get the script and the example config file, edit the config and run the script. This has been tested to work under both Windows and Linux with Python 2.6, but it should work just fine under pretty much any platform. If you run into some kind of error, run the script in debugging mode and send the output to admin@splitstreams.com so I can fix it.

Getting the Files

Subversion

You can check out the latest script and example config from Subversion with:

$ svn co https://svn.splitstreams.com:444/scripts/trunk/python/clwatch clwatch

If you don't have Subversion installed, you can just browse the to the url above, right click on each file and select “Save Link As” and save a copy of the script and config example file. On the Subversion note, if you are on Windows, I highly suggest using TortoiseSVN for actual SVN access.

Download the Package

The package can be downloaded directly by clicking here.

The Example Config

This is what the example config file looks like:

NOTE: This may not be exactly what you have. Read the SVN version for the latest instructions!

# This is an example config for the clwatch script
#
# The [DEFAULT] section defines defaults that will be used in all
# subsequent search configs.  These are all the possible options
# that can be assigned to default (and any specific configs thereafter):
#
#   DEFAULT options:
#       search = this is the search performed on craigslist
#       clBase = This is the base domain to use, it should include the
#                city.  ex.: minneapolis.craigslist.com
#       clCat = The craigslist category to search under.  ex.: jobs (See
#               the next section for all options
#       dbLoc = The location of the db file to store previous lookups.  This
#               should only be defined in DEFAULTS so you use one database
#               file for all lookups
#       alertTo = The email address to send alerts to.
#       smtpServer = The hostname or ip address of the smtp server to use
#                    to send the email.
#                    Note:  If you don't configure an smtp server, all
#                           messages will be printed on standard out instead
#       smtpPort = The port number of the smtp server (probably 25)
#       smtpUser = The username to use for the sending of the email, if
#                  necessary
#       smtpPass = The password to use for the sending of the email, if
#                  necessary
#       smtpSSL = This should only be set to "true" if SSL is required from
#                 the start of the connection (usually on port 465)
#                 NOTE!!!!  This is BROKEN in python 2.6 and 3.0! Do not use
#                 this with either of those versions:
#                     http://bugs.python.org/issue4470
#       smtpUseTLS = If the smtp server supports TLS, you can set this to
#                    true
#
#   clCat options:
#       These are all the categories and their subcategories available
#       on craigslist.  You only have to use only one of the overall
#       category or subcategory, they do not need to be combined.  For
#       example, if you wish search under the "artists" subcategory of
#       "community", you simply use "clCat = artists".  If you wish to
#       search under the entire "community" section, you only need to
#       specify "clCat = community".:
#           community
#               activity partners
#               artists
#               childcare
#               community-general
#               groups
#               local news and views
#               lost & found
#               musicians
#               pets
#               politics
#               rideshare
#               volunteers
#           events
#               classes
#               events-events
#           gigs
#               adult gigs
#               computer gigs
#               creative gigs
#               crew gigs
#               domestic gigs
#               event gigs
#               labor gigs
#               talent gigs
#               writing gigs
#           housing
#               all housing wanted
#               apts wanted
#               apts/housing for rent
#               housing swap
#               office & commercial
#               parking & storage
#               real estate - all
#               real estate - by broker
#               real estate - by owner
#               real estate wanted
#               rooms & shares
#               rooms wanted
#               sublet/temp wanted
#               sublets & temporary
#               vacation rentals
#           jobs
#               admin/office jobs
#               business jobs
#               customer service jobs
#               education jobs
#               engineering jobs
#               etcetera jobs
#               finance jobs
#               food/bev/hosp jobs
#               general labor jobs
#               government jobs
#               healthcare jobs
#               human resource jobs
#               internet engineering jobs
#               legal jobs
#               manufacturing jobs
#               marketing jobs
#               media jobs
#               nonprofit jobs
#               real estate jobs
#               retail/food/hospitality jobs
#               retail/wholesale jobs
#               sales jobs
#               salon/spa/fitness
#               science jobs
#               security jobs
#               skilled trades jobs
#               software jobs
#               systems/networking jobs
#               tech support jobs
#               transport jobs
#               tv video radio jobs
#               web design jobs
#               writing jobs
#           personals
#               casual encounters
#               men seeking men
#               men seeking women
#               misc romance
#               missed connections
#               rants & raves
#               strictly platonic
#               women seeking men
#               women seeking women
#           resumes
#           for sale
#               art & crafts
#               auto parts
#               baby & kid stuff
#               barter
#               bicycles
#               boats
#               books
#               business
#               cars & trucks - all
#               cars & trucks - by dealer
#               cars & trucks - by owner
#               cds / dvds / vhs
#               clothing
#               collectibles
#               computers & tech
#               electronics
#               farm & garden
#               free stuff
#               furniture - all
#               furniture - by dealer
#               furniture - by owner
#               games & toys
#               garage sales
#               fs-general
#               household
#               items wanted
#               jewelry
#               materials
#               motorcycles/scooters
#               musical instruments
#               photo/video
#               recreational vehicles
#               sporting goods
#               tickets
#               tools
#           services
#               adult services
#               automotive services
#               beauty services
#               computer services
#               creative services
#               event services
#               financial services
#               household services
#               labor & moving
#               legal services
#               lessons & tutoring
#               real estate services
#               skilled trade services
#               small biz ads
#               therapeutic services
#               travel/vacation
#               write/edit/trans
#
# Your config should look something like the example below.  After the
# [DEFAULT] section, each section should have a description in the []
# so you know what this search is about
#
#[DEFAULT]
#clBase = minneapolis.craigslist.org
#clCat = for sale
#dbLoc = /home/myuser/.clwatch/store.db
#alertTo = someone@example.com
#smtpServer = mail.example.com
#smtpPort = 25
#smtpUser = myuser
#smtpPass = awesomePassword
#smtpSSL = false
#smtpUseTLS = true
#
#[I want a cat]
#search = fuzzy cat
#
#[Time for a new house]
#search = houses that don't suck
#clCat = real estate - all
#
#[Find a new girlfriend and send to alternate email]
#search = hot women
#clCat = men seeking women
#alertTo = mysecretemail@somewhere.com

Usage

As you can see in the above config file example, the configuration is actually pretty simple. You should set up a DEFAULTS section with just about everything there with the exception of the search option. This will limit the amount you have to repeat yourself in your actual search configs. Note that you can override anything in the search configs and it will be used instead. For example, you could use separate dbLoc directives and use a different db file for each search, if you so choose. You could also override clBase and put in sfbay.craigslist.org if you were looking to move to San Francisco and wanted to scope out houses there.

Listed in order of importance, 1. being the most important, here is how clcheck.py will search for config files:

  1. -c command-line option
  2. ~/.clwatch.cfg
  3. /usr/local/etc/clwatch.cfg
  4. /etc/clwatch.cfg
  5. ./clwatch.cfg ← config in your current directory

After the DEFAULTS section, the individual searches are defined by a name in brackets, []. The name can be anything you wish, but should probably be short since, if you are using the email option, it will be part of the subject.

As noted in the example config, if you do NOT define an smtpServer, the output will be printed to STDOUT. Since this script is meant to be run as a cron job (or “scheduled task” in Windows), you can just use your cron daemon to send the email to a configured address (in your crontab) rather than having the clcheck.py script make a connection to an external mail server.

Here would be a more “real world” example of config file named clwatch.cfg with comments describing what is going on:

[DEFAULT]
# I live in minneapolis so I want to search the 
# minneapolis craigslist by default
clBase = minneapolis.craigslist.org
# I'm usually looking for things that are in the 
# general "for sale" category so I'll just set 
# that as the default
clCat = for sale
# I just want to use one db file (sqlite3 storage) 
# so I'll set it to live in my home directory
dbLoc = /home/myuser/.clwatch/store.db
# I want emails to go to me by default
alertTo = admin@splitstreams.com
# I'll use my local outbound mail server
smtpServer = mail.splitstreams.com
# I'm just using the standard port 25
smtpPort = 25
# My SMTP username (this is obviously optional and 
# dependent upon your setup)
smtpUser = myuser
# My SMTP user password (again, obviously optional)
smtpPass = awesomePassword
# I'm going to make sure everything is encrypted
smtpUseTLS = true

# My first search is for spam and eggs.  I'm just going to 
# let this run as a standard "for sale" search
[spam and eggs]
search = spam eggs

# This second search is for a new house.  I'm looking to move 
# to San Francisco so I'm going to search that location
[Time for a new house]
search = houses that don't suck
clCat = real estate - all
# I only want results in the bay area
clBase = sfbay.craigslist.org

# I'm a sleazy bastard looking for a new girlfriend, but I haven't 
# got the gonads to break up my current girlfriend.  I'm also just 
# going to search the general "personals"
[Find a new girlfriend and send to alternate email]
search = hot women
clCat = personals
alertTo = mysecretemail@somewhere.com

Now that my clwatch.cfg file is set up, I just have to run the script. If you run ./clcheck.py –help, you will see the following output:

$ ./clcheck.py --help
Usage: Usage clcheck.py [options]

Options:
  -h, --help            show this help message and exit
  -c FILE, --config=FILE
                        The path to the config file [default: ./clwatch.cfg]
  -d, --debug           Turn on debugging output [default: False]

By default, the script looks for your config file in the same directory that you run the script from. I'm going to move my script to /home/myuser/.clwatch.cfg and then run the check:

$ ./clcheck.py -c ~/.clwatch.cfg

The first time you run this (and there is no db file), you will get no output since the script assumes you have already checked craigslist for the things you are searching for. Every run after this, however, will send you one email (or output everything to STDOUT) for each configured search if there are new items posted on craigslist.

Any Further Questions

If you have any further questions or suggestions, email me at admin@splitstreams.com.

If you run into any bugs, you can either email me or open a bug report at https://bugzilla.splitstreams.com

programming/python/clcheck.1277826740.txt.gz · Last modified: 2010/06/29 15:52 by jay