Table of Contents
Craigslist Search and Notify Script
NO LONGER MAINTAINED
Sorry to anyone who has arrived here, but I'm no longer maintaining this script. I remembered, when initially writing it, what a pain in the ass it was to try and maintain a screen scraping app. Scraping Craigslist, I've found, is especially masochistic since the site seems to sometimes vary slightly between cities on top of the fact that once they change the underlying HTML (which is pretty awful to start with), it breaks everything.
It's not the most complicated of scripts, so anyone with some Python chops should be able to still take the base and do some tweaking (mainly of the regular expressions) and get this to work. I just don't have the large amount of time to sink into supporting something that is broken by the whims of others. If, in the future, Craigslist publishes a real API of some sort, I will be happy as a clam to rewrite this app using that. Bottom line here: screen scraping sucks for anyone doing it.
What is it?
A friend of mine asked me recently if I knew of a script that would perform a search on craigslist.org and send him an email when there were new results to his search. He's always looking for oddball stuff and by the time he finds it on craigslist and emails/calls the person, the item is already gone.
Since this sounded like something that would actually be quite useful for myself (and others) so I decided to hammer something out.
The Script and Config File
Setting this up is a piece of cake. All you have to do is get the script and the example config file, edit the config and run the script. This has been tested to work under both Windows and Linux with Python 2.6, but it should work just fine under pretty much any platform. If you run into some kind of error, run the script in debugging mode and send the output to admin@splitstreams.com so I can fix it.
Getting the Files
Subversion
You can check out the latest script and example config from Subversion with:
$ svn co https://svn.splitstreams.com:444/scripts/trunk/python/clwatch clwatch
If you don't have Subversion installed, you can just browse the to the url above, right click on each file and select “Save Link As” and save a copy of the script and config example file. On the Subversion note, if you are on Windows, I highly suggest using TortoiseSVN for actual SVN access.
Download the Package
The package can be downloaded directly by clicking here.
The Example Config
This is what the example config file looks like:
NOTE: This may not be exactly what you have. Read the SVN version for the latest instructions!
# This is an example config for the clwatch script # # The [DEFAULT] section defines defaults that will be used in all # subsequent search configs. These are all the possible options # that can be assigned to default (and any specific configs thereafter): # # DEFAULT options: # search = this is the search performed on craigslist # clBase = This is the base domain to use, it should include the # city. ex.: minneapolis.craigslist.com # clCat = The craigslist category to search under. ex.: jobs (See # the next section for all options # dbLoc = The location of the db file to store previous lookups. This # should only be defined in DEFAULTS so you use one database # file for all lookups # alertTo = The email address to send alerts to. # smtpServer = The hostname or ip address of the smtp server to use # to send the email. # Note: If you don't configure an smtp server, all # messages will be printed on standard out instead # smtpPort = The port number of the smtp server (probably 25) # smtpUser = The username to use for the sending of the email, if # necessary # smtpPass = The password to use for the sending of the email, if # necessary # smtpSSL = This should only be set to "true" if SSL is required from # the start of the connection (usually on port 465) # NOTE!!!! This is BROKEN in python 2.6 and 3.0! Do not use # this with either of those versions: # http://bugs.python.org/issue4470 # smtpUseTLS = If the smtp server supports TLS, you can set this to # true # # clCat options: # These are all the categories and their subcategories available # on craigslist. You only have to use only one of the overall # category or subcategory, they do not need to be combined. For # example, if you wish search under the "artists" subcategory of # "community", you simply use "clCat = artists". If you wish to # search under the entire "community" section, you only need to # specify "clCat = community".: # community # activity partners # artists # childcare # community-general # groups # local news and views # lost & found # musicians # pets # politics # rideshare # volunteers # events # classes # events-events # gigs # adult gigs # computer gigs # creative gigs # crew gigs # domestic gigs # event gigs # labor gigs # talent gigs # writing gigs # housing # all housing wanted # apts wanted # apts/housing for rent # housing swap # office & commercial # parking & storage # real estate - all # real estate - by broker # real estate - by owner # real estate wanted # rooms & shares # rooms wanted # sublet/temp wanted # sublets & temporary # vacation rentals # jobs # admin/office jobs # business jobs # customer service jobs # education jobs # engineering jobs # etcetera jobs # finance jobs # food/bev/hosp jobs # general labor jobs # government jobs # healthcare jobs # human resource jobs # internet engineering jobs # legal jobs # manufacturing jobs # marketing jobs # media jobs # nonprofit jobs # real estate jobs # retail/food/hospitality jobs # retail/wholesale jobs # sales jobs # salon/spa/fitness # science jobs # security jobs # skilled trades jobs # software jobs # systems/networking jobs # tech support jobs # transport jobs # tv video radio jobs # web design jobs # writing jobs # personals # casual encounters # men seeking men # men seeking women # misc romance # missed connections # rants & raves # strictly platonic # women seeking men # women seeking women # resumes # for sale # art & crafts # auto parts # baby & kid stuff # barter # bicycles # boats # books # business # cars & trucks - all # cars & trucks - by dealer # cars & trucks - by owner # cds / dvds / vhs # clothing # collectibles # computers & tech # electronics # farm & garden # free stuff # furniture - all # furniture - by dealer # furniture - by owner # games & toys # garage sales # fs-general # household # items wanted # jewelry # materials # motorcycles/scooters # musical instruments # photo/video # recreational vehicles # sporting goods # tickets # tools # services # adult services # automotive services # beauty services # computer services # creative services # event services # financial services # household services # labor & moving # legal services # lessons & tutoring # real estate services # skilled trade services # small biz ads # therapeutic services # travel/vacation # write/edit/trans # # Your config should look something like the example below. After the # [DEFAULT] section, each section should have a description in the [] # so you know what this search is about # #[DEFAULT] #clBase = minneapolis.craigslist.org #clCat = for sale #dbLoc = /home/myuser/.clwatch/store.db #alertTo = someone@example.com #smtpServer = mail.example.com #smtpPort = 25 #smtpUser = myuser #smtpPass = awesomePassword #smtpSSL = false #smtpUseTLS = true # #[I want a cat] #search = fuzzy cat # #[Time for a new house] #search = houses that don't suck #clCat = real estate - all # #[Find a new girlfriend and send to alternate email] #search = hot women #clCat = men seeking women #alertTo = mysecretemail@somewhere.com
Usage
As you can see in the above config file example, the configuration is actually pretty simple. You should set up a DEFAULTS
section with just about everything there with the exception of the search
option. This will limit the amount you have to repeat yourself in your actual search configs. Note that you can override anything in the search configs and it will be used instead. For example, you could use separate dbLoc
directives and use a different db file for each search, if you so choose. You could also override clBase and put in sfbay.craigslist.org
if you were looking to move to San Francisco and wanted to scope out houses there.
Listed in order of importance, 1. being the most important, here is how clcheck.py
will search for config files:
-c
command-line option~/.clwatch.cfg
/usr/local/etc/clwatch.cfg
/etc/clwatch.cfg
./clwatch.cfg
← config in your current directory
After the DEFAULTS
section, the individual searches are defined by a name in brackets, []. The name can be anything you wish, but should probably be short since, if you are using the email option, it will be part of the subject.
As noted in the example config, if you do NOT define an smtpServer, the output will be printed to STDOUT. Since this script is meant to be run as a cron job (or “scheduled task” in Windows), you can just use your cron daemon to send the email to a configured address (in your crontab) rather than having the clcheck.py
script make a connection to an external mail server.
Here would be a more “real world” example of config file named clwatch.cfg
with comments describing what is going on:
[DEFAULT] # I live in minneapolis so I want to search the # minneapolis craigslist by default clBase = minneapolis.craigslist.org # I'm usually looking for things that are in the # general "for sale" category so I'll just set # that as the default clCat = for sale # I just want to use one db file (sqlite3 storage) # so I'll set it to live in my home directory dbLoc = /home/myuser/.clwatch/store.db # I want emails to go to me by default alertTo = admin@splitstreams.com # I'll use my local outbound mail server smtpServer = mail.splitstreams.com # I'm just using the standard port 25 smtpPort = 25 # My SMTP username (this is obviously optional and # dependent upon your setup) smtpUser = myuser # My SMTP user password (again, obviously optional) smtpPass = awesomePassword # I'm going to make sure everything is encrypted smtpUseTLS = true # My first search is for spam and eggs. I'm just going to # let this run as a standard "for sale" search [spam and eggs] search = spam eggs # This second search is for a new house. I'm looking to move # to San Francisco so I'm going to search that location [Time for a new house] search = houses that don't suck clCat = real estate - all # I only want results in the bay area clBase = sfbay.craigslist.org # I'm a sleazy bastard looking for a new girlfriend, but I haven't # got the gonads to break up my current girlfriend. I'm also just # going to search the general "personals" [Find a new girlfriend and send to alternate email] search = hot women clCat = personals alertTo = mysecretemail@somewhere.com
Now that my clwatch.cfg
file is set up, I just have to run the script. If you run ./clcheck.py –help
, you will see the following output:
$ ./clcheck.py --help Usage: Usage clcheck.py [options] Options: -h, --help show this help message and exit -c FILE, --config=FILE The path to the config file [default: ./clwatch.cfg] -d, --debug Turn on debugging output [default: False]
By default, the script looks for your config file in the same directory that you run the script from. I'm going to move my script to /home/myuser/.clwatch.cfg
and then run the check:
$ ./clcheck.py -c ~/.clwatch.cfg
The first time you run this (and there is no db file), you will get no output since the script assumes you have already checked craigslist for the things you are searching for. Every run after this, however, will send you one email (or output everything to STDOUT) for each configured search if there are new items posted on craigslist.
Any Further Questions
If you have any further questions or suggestions, email me at admin@splitstreams.com.
If you run into any bugs, you can either email me or open a bug report at https://bugzilla.splitstreams.com