celex-articles-extractor

Scrapes eur-lex.europa.eu and parses the documents to segmentate them into articles.

Input

In the input folder, place a csv file with at least one column called "celex".

Execute

Run celex_query.py using Python3

Optional ( In the code of celex_query.py change the documents limmit or articles limit (default=90 articles) )

Output

A folder called "output" will be created with subfolders called from 1 to k (k = maximun number of every article processed)

Example: a folder called 12 with 3 files inside, means that each one of those 3 files are the Article 12 of 3 different documents.

Inside each subfolder, multiple .txt files, named after their corresponding celex and containing the article's text.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.vscode		.vscode
input		input
sparql		sparql
.gitignore		.gitignore
README.md		README.md
celex_query.py		celex_query.py
get_structure.py		get_structure.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

celex-articles-extractor

Input

Execute

Output

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

MaastrichtU-BISS/celex-articles-extractor

Folders and files

Latest commit

History

Repository files navigation

celex-articles-extractor

Input

Execute

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages