User Tools

Site Tools


programming:python:clcheck

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
programming:python:clcheck [2009/06/15 18:11] – created crustymonkeyprogramming:python:clcheck [2011/03/29 20:15] (current) – [What is it?] jay
Line 1: Line 1:
 ====== Craigslist Search and Notify Script ====== ====== Craigslist Search and Notify Script ======
 +===== NO LONGER MAINTAINED =====
 +Sorry to anyone who has arrived here, but I'm no longer maintaining this script.  I remembered, when initially writing it, what a pain in the ass it was to try and maintain a screen scraping app.  Scraping Craigslist, I've found, is especially masochistic since the site seems to sometimes vary slightly between cities on top of the fact that once they change the underlying HTML (which is pretty awful to start with), it breaks everything.
 +
 +It's not the most complicated of scripts, so anyone with some Python chops should be able to still take the base and do some tweaking (mainly of the regular expressions) and get this to work.  I just don't have the large amount of time to sink into supporting something that is broken by the whims of others.  If, in the future, Craigslist publishes a real API of some sort, I will be happy as a clam to rewrite this app using that.  Bottom line here: screen scraping sucks for anyone doing it. 
 +
 ===== What is it? ===== ===== What is it? =====
 A friend of mine asked me recently if I knew of a script that would perform a search on [[http://craigslist.org|craigslist.org]] and send him an email when there were new results to his search.  He's always looking for oddball stuff and by the time //he// finds it on craigslist and emails/calls the person, the item is already gone. A friend of mine asked me recently if I knew of a script that would perform a search on [[http://craigslist.org|craigslist.org]] and send him an email when there were new results to his search.  He's always looking for oddball stuff and by the time //he// finds it on craigslist and emails/calls the person, the item is already gone.
Line 6: Line 11:
  
 ===== The Script and Config File ===== ===== The Script and Config File =====
-Setting this up is a piece of cake.  All you have to do is get the script and the example config file, edit the config and run the script.  This has been tested to work under both Windows and Linux with Python 2.6, but it should work just fine under pretty much any platform.  If you run into some kind of error, run the script in debugging mode and send the output to admin@splitstreams.com so I can fix it. +Setting this up is a piece of cake.  All you have to do is get the script and the example config file, edit the config and run the script.  This has been tested to work under both Windows and Linux with Python 2.6, but it should work just fine under pretty much any platform.  If you run into some kind of error, run the script in debugging mode and send the output to <admin@splitstreams.comso I can fix it.
 ==== Getting the Files ==== ==== Getting the Files ====
 +=== Subversion ===
 You can check out the latest script and example config from [[http://subversion.tigris.org/|Subversion]] with: You can check out the latest script and example config from [[http://subversion.tigris.org/|Subversion]] with:
  
Line 16: Line 21:
  
 If you don't have [[http://subversion.tigris.org/|Subversion]] installed, you can just browse the to the url above, right click on each file and select "Save Link As" and save a copy of the script and config example file.  On the [[http://subversion.tigris.org/|Subversion]] note, if you are on Windows, I highly suggest using [[http://tortoisesvn.tigris.org/|TortoiseSVN]] for actual SVN access. If you don't have [[http://subversion.tigris.org/|Subversion]] installed, you can just browse the to the url above, right click on each file and select "Save Link As" and save a copy of the script and config example file.  On the [[http://subversion.tigris.org/|Subversion]] note, if you are on Windows, I highly suggest using [[http://tortoisesvn.tigris.org/|TortoiseSVN]] for actual SVN access.
 +=== Download the Package ===
 +The package can be downloaded directly by {{:programming:python:clwatch-0.3.5.tar.gz|clicking here}}.
  
 ==== The Example Config ==== ==== The Example Config ====
Line 237: Line 244:
 ===== Usage ===== ===== Usage =====
 As you can see in the above config file example, the configuration is actually pretty simple.  You should set up a ''DEFAULTS'' section with just about everything there with the exception of the ''search'' option.  This will limit the amount you have to repeat yourself in your actual search configs.  Note that you can override anything in the search configs and it will be used instead.  For example, you could use separate ''dbLoc'' directives and use a different db file for each search, if you so choose.  You could also override clBase and put in ''sfbay.craigslist.org'' if you were looking to move to San Francisco and wanted to scope out houses there. As you can see in the above config file example, the configuration is actually pretty simple.  You should set up a ''DEFAULTS'' section with just about everything there with the exception of the ''search'' option.  This will limit the amount you have to repeat yourself in your actual search configs.  Note that you can override anything in the search configs and it will be used instead.  For example, you could use separate ''dbLoc'' directives and use a different db file for each search, if you so choose.  You could also override clBase and put in ''sfbay.craigslist.org'' if you were looking to move to San Francisco and wanted to scope out houses there.
 +
 +Listed in order of importance, 1. being the most important, here is how ''clcheck.py'' will search for config files:
 +
 +  - ''-c'' command-line option
 +  - ''~/.clwatch.cfg''
 +  - ''/usr/local/etc/clwatch.cfg''
 +  - ''/etc/clwatch.cfg''
 +  - ''./clwatch.cfg'' <- config in your current directory
  
 After the ''DEFAULTS'' section, the individual searches are defined by a name in brackets, [].  The name can be anything you wish, but should probably be short since, if you are using the email option, it will be part of the subject. After the ''DEFAULTS'' section, the individual searches are defined by a name in brackets, [].  The name can be anything you wish, but should probably be short since, if you are using the email option, it will be part of the subject.
Line 242: Line 257:
 As noted in the example config, if you do **NOT** define an smtpServer, the output will be printed to STDOUT.  Since this script is meant to be run as a cron job (or "scheduled task" in Windows), you can just use your cron daemon to send the email to a configured address (in your crontab) rather than having the ''clcheck.py'' script make a connection to an external mail server. As noted in the example config, if you do **NOT** define an smtpServer, the output will be printed to STDOUT.  Since this script is meant to be run as a cron job (or "scheduled task" in Windows), you can just use your cron daemon to send the email to a configured address (in your crontab) rather than having the ''clcheck.py'' script make a connection to an external mail server.
  
-Here would be a more "real world" example of config file with comments describing what is going on:+Here would be a more "real world" example of config file named ''clwatch.cfg'' with comments describing what is going on:
  
 <code> <code>
 [DEFAULT] [DEFAULT]
-# I live in minneapolis so I want to search the minneapolis craigslist by default+# I live in minneapolis so I want to search the  
 +minneapolis craigslist by default
 clBase = minneapolis.craigslist.org clBase = minneapolis.craigslist.org
-# I'm usually looking for things that are in the general "for sale" category so I'll just set that as the default+# I'm usually looking for things that are in the  
 +general "for sale" category so I'll just set  
 +that as the default
 clCat = for sale clCat = for sale
-# I just want to use one db file (sqlite3 storage) so I'll set it to live in my home directory+# I just want to use one db file (sqlite3 storage)  
 +so I'll set it to live in my home directory
 dbLoc = /home/myuser/.clwatch/store.db dbLoc = /home/myuser/.clwatch/store.db
 # I want emails to go to me by default # I want emails to go to me by default
Line 256: Line 275:
 # I'll use my local outbound mail server # I'll use my local outbound mail server
 smtpServer = mail.splitstreams.com smtpServer = mail.splitstreams.com
-#smtpPort = 25 +I'm just using the standard port 25 
-#smtpUser = myuser +smtpPort = 25 
-#smtpPass = awesomePassword +My SMTP username (this is obviously optional and  
-#smtpSSL = false +# dependent upon your setup) 
-#smtpUseTLS = true +smtpUser = myuser 
-# +My SMTP user password (again, obviously optional) 
-+smtpPass = awesomePassword 
-#[I want cat+I'm going to make sure everything is encrypted 
-#search = fuzzy cat +smtpUseTLS = true 
-+ 
-#[Time for a new house] +My first search is for spam and eggs.  I'm just going to  
-#search = houses that don't suck +let this run as standard "for sale" search 
-#clCat = real estate - all +[spam and eggs
-+search = spam eggs 
-#[Find a new girlfriend and send to alternate email] + 
-#search = hot women +This second search is for a new house.  I'm looking to move  
-#clCat = men seeking women +to San Francisco so I'm going to search that location 
-#alertTo = mysecretemail@somewhere.com+[Time for a new house] 
 +search = houses that don't suck 
 +clCat = real estate - all 
 +I only want results in the bay area 
 +clBase = sfbay.craigslist.org 
 + 
 +I'm a sleazy bastard looking for a new girlfriend, but I haven't  
 +# got the gonads to break up my current girlfriend.  I'm also just  
 +# going to search the general "personals" 
 +[Find a new girlfriend and send to alternate email] 
 +search = hot women 
 +clCat = personals 
 +alertTo = mysecretemail@somewhere.com
 </code> </code>
 +
 +Now that my ''clwatch.cfg'' file is set up, I just have to run the script.  If you run ''./clcheck.py --help'', you will see the following output:
 +
 +<code>
 +$ ./clcheck.py --help
 +Usage: Usage clcheck.py [options]
 +
 +Options:
 +  -h, --help            show this help message and exit
 +  -c FILE, --config=FILE
 +                        The path to the config file [default: ./clwatch.cfg]
 +  -d, --debug           Turn on debugging output [default: False]
 +</code>
 +
 +By default, the script looks for your config file in the same directory that you run the script from.  I'm going to move my script to ''/home/myuser/.clwatch.cfg'' and then run the check:
 +
 +<code>
 +$ ./clcheck.py -c ~/.clwatch.cfg
 +</code>
 +
 +The first time you run this (and there is no db file), you will get no output since the script assumes you have already checked craigslist for the things you are searching for.  Every run after this, however, will send you one email (or output everything to STDOUT) for each configured search if there are new items posted on craigslist.
 +
 +===== Any Further Questions =====
 +If you have any further questions or suggestions, email me at <admin@splitstreams.com>.
 +
 +If you run into any bugs, you can either email me or open a bug report at https://bugzilla.splitstreams.com
programming/python/clcheck.1245089505.txt.gz · Last modified: 2009/06/15 18:11 by crustymonkey