Follow the instructions in the Python Quick Start Guide to install Homebrew, Git, PostGIS, Python 3.3+ and virtualenv.
mkvirtualenv scrapers-ca --python=`which python3`
git clone git://github.com/opencivicdata/scrapers-ca.git
cd scrapers-ca
pip install -r requirements.txt
Initialize the database:
createdb pupa
psql pupa -c "CREATE EXTENSION postgis;"
pupa dbinit ca
pupa update ca_ab_edmonton
To run only the scraping step and skip the import step add the --scrape
switch:
pupa update --scrape ca_ab_edmonton
For documentation on the pupa
command:
pupa -h
For documentation on the update
subcommand:
pupa update -h
Find division identifiers using the Open Civic Data Division Identifier (OCD-ID) Viewer or by browsing the list of identifiers. In most cases, a municipality will have a division identifier with a type ID of csd
. Then, create a scraper with:
pupa init ca_on_toronto
Read the Pupa documentation or an existing scraper's code.
Avoid using the XPath string()
function unless the expression is known to not have matches on some pages. Otherwise, scrapers may continue to run without error despite failing to find a match. A comment like # can be empty
or # allow string()
should accompany the use of string()
.
Use the get_email
and get_phone
helpers as much as possible.
Check module names, class names, classification
, division_name
, name
and url
in __init.py__
files:
invoke tidy
Check sources are credited:
invoke sources
Check jurisdiction URLs:
invoke urls
Check PEP 8 conformance:
flake8 .
Update the OCD-IDs:
curl -O https://raw.githubusercontent.com/opencivicdata/ocd-division-ids/master/identifiers/country-ca.csv
Scraper code rarely undergoes code review. The focus is on the quality of the data.
This repository is on GitHub: https://github.com/opencivicdata/scrapers-ca, where your contributions, forks, bug reports, feature requests, and feedback are greatly welcomed.
Copyright (c) 2013 Open North Inc., released under the MIT license