Contact person: Erik Tjong Kim Sang [email protected]
Notebooks for scraping websites with medical guidelines and performing text analysis
- Run scrape_website.ipynbto retrieve the html files. They will be stored in the directory../data/richtlijnendatabase.nl
- Run get_paragraphs.ipynbto extract the paragraphs with text from the downloaded files. They will be stored in the filecsv/paragraphs_20210712.csv
- Run steps 1 and 4 of text_ranking.ipynbto find the paragraphs with relevant medical terms regarding ehealth. This information will be stored in the filesparagraphs.jsonandindex.html
- Run json_diff.ipynbto compare the json file of step 3 with a previous version and classify the html pages according to treatment steps. The results will be stored in the fileindex.html