A pipeline that fetches papers via the Semantic Scholar API and filters their content using an LLM.
Create a new conda environment
conda create --name paper_env python=3.10
Activate the conda environment
conda activate paper_env
Install necessary package
pip insall -r requirements.txt
Download MRCONSO.RRF
and place it in the same folder
Download Link: https://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html
export OPENAI_API_KEY="Your openai api key"
export UMLS_API_KEY="Your umls api key"
- Use Semantic Scholar API to fetch papers (keyword = "volatile organic compound human")
python fetch_paper.py
- Filter papers with diseases
python filter_dusease.py
- Extract compound name and CID, then check the relation between compounds and diseases
python compound_disease_relation.py