Skip to content

Latest commit

 

History

History
36 lines (22 loc) · 908 Bytes

README.rdoc

File metadata and controls

36 lines (22 loc) · 908 Bytes

SEIA Data

A set of tools for crawling, storing and analyzing data from www.e-seia.cl, a chilean government site.

Requirements

A python interpreter and a unix like system with wget.

Usage

To download all pages you can use:

get_seia_project_list.py <pages>

Where pages is the number of pages that you must to lookup in the search results manually. For large numbers of pages this can take a while.

Downloaded pages will be stored in seia_project_list/ folder.

After download the list you can generate a YAML version with:

ruby project_list_parser.rb > project_list.yaml

To-do

  • Add range of time in the get_seia_project_list.py command argmuments.

  • Get the number of pages automatically for a time range.

  • Parse list pages to get the porjects id.

  • Download the project data.

  • Store the project data.

  • Write some queries.

License

This is free software under GPL v3.