Skip to content

Extract tables from wikipedia pages to CSV file

Notifications You must be signed in to change notification settings

ezimir/wikitabletocsv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

wikitabletocsv

Convert tables on wikipedia pages to a CSV file.

Other solutions

I find these solutions not to be satisfactory to my needs:

Therefore, I'm doing my own!

Requirements

  • python2.7 (probably works with other versions too)
  • BeautifulSoup4 (used for HTML traversing and text extraction)

Installation

Clone or download.

$ pip install -r requirements.txt

Usage

$ ./tabletocsv.py --help
usage: tabletocsv.py [-h] [--class-name CLASS_NAME] url [dest]

positional arguments:
  url         Source of data. Expects to contain one or more <table> elements
              with specified class name.
  dest        The file to write output to. Omit or use '-' to write to stdout.

optional arguments:
  -h, --help  show this help message and exit
  --class-name CLASS_NAME
              CSS class name for the tables to search for. (Without dot, just
              name.) (default: wikitable)

Caveats

Is only guaranteed to work on this Wiki page. I haven't tried any other.

There's no real input validation either. Use with care.

About

Extract tables from wikipedia pages to CSV file

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages