Skip to content

Scrapes eur-lex.europa.eu and parses the documents to segmentate them into articles.

Notifications You must be signed in to change notification settings

MaastrichtU-BISS/celex-articles-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

celex-articles-extractor

Scrapes eur-lex.europa.eu and parses the documents to segmentate them into articles.

Input

In the input folder, place a csv file with at least one column called "celex".

Execute

Run celex_query.py using Python3

  • Optional ( In the code of celex_query.py change the documents limmit or articles limit (default=90 articles) )

Output

A folder called "output" will be created with subfolders called from 1 to k (k = maximun number of every article processed)

  • Example: a folder called 12 with 3 files inside, means that each one of those 3 files are the Article 12 of 3 different documents.

Inside each subfolder, multiple .txt files, named after their corresponding celex and containing the article's text.

About

Scrapes eur-lex.europa.eu and parses the documents to segmentate them into articles.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages