Scrapes and parses the documents to segmentate them into articles.
In the input folder, place a csv file with at least one column called "celex".
Run using Python3
- Optional ( In the code of change the documents limmit or articles limit (default=90 articles) )
A folder called "output" will be created with subfolders called from 1 to k (k = maximun number of every article processed)
- Example: a folder called 12 with 3 files inside, means that each one of those 3 files are the Article 12 of 3 different documents.
Inside each subfolder, multiple .txt files, named after their corresponding celex and containing the article's text.