Skip to content

DS4SD/SubGrapher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SubGrapher-IBM

Huggingface arXiv paper

This is the repository for SubGrapher: Visual Fingerprinting of Chemical Structures. SubGrapher is a model to detect functional groups in molecule images and convert them into visual fingerprints.

Citation

If you find this repository useful, please consider citing:

@article{Morin2025,
	title        = {{Subgrapher: visual fingerprinting of chemical structures}},
	author       = {Morin, Lucas and Meijer, Gerhard Ingmar and Weber, Val{\'e}ry and Van Gool, Luc and Staar, Peter W. J.},
	year         = 2025,
	month        = {Sep},
	day          = 29,
	journal      = {Journal of Cheminformatics},
	volume       = 17,
	number       = 1,
	pages        = 149,
	doi          = {10.1186/s13321-025-01091-4},
	issn         = {1758-2946},
    url          = {https://doi.org/10.1186/s13321-025-01091-4}
}

Installation

Create a virtual environment.

python3.11 -m venv subgrapher-env
source subgrapher-env/bin/activate

Install SubGrapher.

pip install -e .

Inference

Script

  1. Place your input images in: SubGrapher/data/images/default/.

  2. Run SubGrapher:

python3 subgrapher/scripts/run.py
  1. Read predictions in: SubGrapher/data/predictions/default/. Predictions are structured as follows:
{
    "substructure": "Amine_(tertiary)",       # Predicted substructure                               
    "confidence": 1.0,                        # Model confidence
    "bbox": [525.3, 273.7, 675.8, 403.7],     # Predicted bounding box
    "type": "functional-groups"               # Substructure type (functional-groups or carbon-chains)
}
...
  1. (Optional) Read visualizations in: SubGrapher/data/visualization/default/. Visualizations are structured as follows:
SubGrapher/data/visualization/default/
├── US07320977-20080122-C00078_fingerprint.png          # Visualization of the SVMF 
├── US07320977-20080122-C00078_substructures.png        # Visualization of predicted substructures
├── US07320977-20080122-C00078_substructures.txt        # Labels of predicted substructures
...

Model

The model weights are available on Hugging Face. They are automatically downloaded when running the model's inference.

Evaluation Dataset

The benchmarks used for the visual fingerprinting evaluation are available on HuggingFace (download size: 50 MB).

wget https://huggingface.co/datasets/ds4sd/SubGrapher-Datasets/resolve/main/benchmarks.zip

The benchmarks are structured as follows:

benchmarks/
├── adenosine/                          # Benchmark (Adenosine, Camphor, Cholesterol, Limonene, or Pyridine.)
│   ├── 90.csv/                         # SMILES 
│   └── adenosine/                      
│        ├── images_2/                  # Images 
│        └── molfiles/                  # MolFiles 
...

About

[J. Cheminform.] SubGrapher: Visual Fingerprinting of Chemical Structures

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages