Contact person: Erik Tjong Kim Sang [email protected]
Notebooks for scraping websites with medical guidelines and performing text analysis
- Run
scrape_website.ipynbto retrieve the html files. They will be stored in the directory../data/richtlijnendatabase.nl - Run
get_paragraphs.ipynbto extract the paragraphs with text from the downloaded files. They will be stored in the filecsv/paragraphs_20210712.csv - Run steps 1 and 4 of
text_ranking.ipynbto find the paragraphs with relevant medical terms regarding ehealth. This information will be stored in the filesparagraphs.jsonandindex.html - Run
json_diff.ipynbto compare the json file of step 3 with a previous version and classify the html pages according to treatment steps. The results will be stored in the fileindex.html