Downloads PPI networks from multiple sources and converts them to different ID spaces. The final networks are saved as .gt files.
The IDs are mapped based on the mapping file provided py UniProt. Gene symbols are mapped using the mygene Python package.
The networks are converted to different ID spaces starting from UniProtKB-AC IDs. If a network source does not provide UniProtKB-AC IDs (STRING and HIPPIE), they are mapped to UniProtKB-AC IDs first. If a source ID can be mapped to multiple target IDs, all target IDs are assumed to interact with all interaction partners of the source ID.
Some sources (HIPPIE, STRING, NeDRex) are used generate multiple subsets based on different confidence thresholds.
All self-loops (same source and target node) and duplicate edges (e.g., A -> B and B -> A) are removed.
Network sources: STRING, HIPPIE, BioGRID, IID, NeDRex
ID spaces: Entrez genes, Ensembl genes, gene symbols, UniProtKB-AC
A table of the prepared networks and their sizes can be found here.
Clone repository:
git clone https://github.com/REPO4EU/network_preparation.gitInstall dependencies with conda:
cd network_preparation
conda env create -f environment.yml
conda activate network_preparationExecute the program:
python src/main.pySee options:
python src/main.py --helpConfig options can be adjusted by modifying config.toml.