Skip to content

databricks-industry-solutions/genesis-workbench

Repository files navigation

Genesis Workbench

A Blueprint for a Life Sciences application on Databricks

$${\color{orange}Opportunity}$$

Generative AI reshaping life sciences

Generative AI is reshaping the life sciences. Foundation models trained on genomics, protein structures, molecular interactions, and cellular behaviors are unlocking capabilities that weren't possible a few years ago:

  • Foundation models across every domain — From DNA to proteins to single cells to small molecules, there's now a pretrained model for almost every biological modality. Teams can start from these and fine-tune on their own data instead of training from scratch.
  • Drug discovery acceleration — Generative AI compresses months of wet-lab iteration into hours of in-silico screening, candidate ranking, and hypothesis generation. Lead time on early-stage discovery drops by an order of magnitude.
  • Higher-accuracy structure prediction — Tools like AlphaFold, ESMFold, and Boltz predict 3D structure from sequence with previously unattainable accuracy, opening up targets that were structurally inaccessible.
  • Personalized medicine — Patient-specific modeling of therapeutic responses is moving from research curiosity to clinical reality, enabling precision treatment decisions at the individual level.
  • Knowledge synthesis at scale — LLMs specialized in biomedical literature surface insights buried in millions of papers, accelerating hypothesis generation and reducing duplicated work.

$${\color{orange}Challenges}$$

Scientists overwhelmed by AI/infrastructure challenges

Despite the breakthroughs, the experts who can apply these models — biologists, geneticists, biochemists — spend most of their time on tasks far outside their training:

  • GPU and CUDA infrastructure — Days spent configuring drivers, toolkits, and CUDA-compatible PyTorch builds before a single inference can run.
  • Workflow orchestration — Stitching together data prep, training, evaluation, and serving pipelines requires software-engineering skills the biology curriculum doesn't cover.
  • Data engineering and governance — Collecting, cleaning, and integrating heterogeneous biological datasets under privacy and reproducibility constraints is a full-time job in itself.
  • Model serving and lifecycle — Even after training, productionizing a model — registry, endpoints, monitoring, retraining — is yet another discipline the scientist has to pick up.
  • Time stolen from the science — Every hour spent on infrastructure is an hour not spent on biology. The slower the loop, the slower the innovation.

$${\color{orange}Genesis}$$ $${\color{orange}Workbench}$$

Genesis Workbench: Blueprint for life sciences application on Databricks

Genesis Workbench is an open-source, Databricks-native blueprint that packages biological foundation models behind an intuitive UI — so scientists can run them without managing GPU clusters, CUDA, model registries, or serving endpoints.

  • Pre-packaged biological models ready to deploy: ESMFold, AlphaFold2, ProteinMPNN, RFDiffusion, scGPT, SCimilarity, Scanpy, rapids-singlecell (part of scverse), ChemProp, DiffDock, Boltz, NVIDIA Parabricks, NVIDIA BioNeMo and more.
  • Tailored workflows for protein design, drug discovery, single-cell analysis, and variant analysis — each surfaced as a UI tab with sane defaults.
  • Built on Databricks primitives: Asset Bundles, Workflows, Model Serving, MLflow, Unity Catalog, and Databricks Apps — so everything you run is governed, reproducible, and traceable.
  • Modular and extensible: each capability is an independent module that can be deployed and destroyed independently.

$${\color{orange}Modules}$$

Each module is independently deployable with ./deploy.sh <module> <cloud>. Click through to the workflow deep-dives for inputs, outputs, models, and example runs.

Single Cell

Single-cell RNA-seq at scale. Run end-to-end pipelines on millions of cells with Scanpy or GPU-accelerated rapids-singlecell (part of scverse). Annotate cell types per cluster against SCimilarity's 23M-cell reference database, search published studies for similar cells, and predict the effect of gene knockouts or overexpression with scGPT. The interactive results viewer offers UMAP exploration, differential expression, pathway enrichment, and trajectory analysis on the same run.

Models bundled: scGPT, SCimilarity, Scanpy, rapids-singlecell (part of scverse), Merck TEDDY-G 400M (joint cell-type + disease annotation)

📖 Single Cell Analysis · Cell Type Annotation · Cell Similarity Search · Gene Perturbation Prediction

Large Molecule

Protein structure prediction, design, and engineering. Fold proteins in seconds with ESMFold or at high accuracy with AlphaFold2; design novel backbones with RFDiffusion + ProteinMPNN; redesign sequences for a fixed backbone (inverse folding) with ProteinMPNN and validate each design by re-folding with ESMFold; run BLAST-like searches across 150M+ sequences using ESM-2 embeddings. The Guided Enzyme Optimization workflow chains Proteina-Complexa, ProteinMPNN, ESMFold, Boltz, NetSolP, PLTNUM, DeepSTABp, and MHCflurry into a reward-weighted scaffold-and-score loop that surfaces variants ranked on motif fidelity, fold confidence, optional substrate complex, and four developability axes (solubility, anchor-relative half-life, melting temperature, immunogenic burden).

Models bundled: ESMFold, ESM-2 Embeddings, AlphaFold2, ProteinMPNN, RFDiffusion, Boltz

📖 Protein Structure Prediction · Protein Design · Inverse Folding · Sequence Similarity Search · Guided Enzyme Optimization

Small Molecule

Drug-discovery essentials. Generate novel candidate molecules from a seed scaffold or binding motif with GenMol in a hard-constraint generate→score→reseed loop (Guided Molecule Design), profile candidates for drug-like properties and toxicity with ChemProp, predict protein-ligand binding poses with DiffDock, design protein binders to a target protein or small molecule with Proteina-Complexa, and transplant functional motifs into new scaffolds. Each generated candidate can be scored on developability through NetSolP (solubility), PLTNUM-ESM2 (relative half-life), DeepSTABp (melting temperature), and MHCflurry (immunogenic burden).

Models bundled: GenMol, ChemProp, DiffDock, Proteina-Complexa, NetSolP-1.0, PLTNUM-ESM2, DeepSTABp, MHCflurry 2.x

📖 Guided Molecule Design · Molecular Docking · Protein Binder Design · Ligand Binder Design · Motif Scaffolding · ADMET & Safety

Genomics

Variant analysis at population scale. Call germline variants from FASTQ files with GPU-accelerated NVIDIA Parabricks, ingest VCFs into Delta tables for fast SQL/Spark queries, run genome-wide association studies (GWAS) using Glow, and annotate variants with ClinVar clinical-significance data. Inline interactive charts in the UI break down findings by gene, ACMG category, clinical significance, and zygosity.

Components: NVIDIA Parabricks variant calling, Glow GWAS, Spark VCF→Delta ingestion, ClinVar annotation

📖 Variant Calling · VCF Ingestion · GWAS Analysis · Variant Annotation

NVIDIA BioNeMo

NVIDIA's generative AI framework for digital biology. The optional BioNeMo module ships container definitions and workflows that expose pre-trained BioNeMo models, starting with ESM-2 fine-tuning and inference. Containers are optimized for NVIDIA hardware and integrated into Genesis Workbench's job system, MLflow registry, and model-serving infrastructure.

📖 ESM2 Fine-tuning & Inference

📚 Full workflow catalog: documentation index

$${\color{orange}Architecture}$$ $${\color{orange}Diagram}$$

Architecture

$${\color{orange}Important}$$ $${\color{orange}Disclaimer}$$

NVIDIA, the NVIDIA logo, and NVIDIA BioNeMo are trademarks or registered trademarks of NVIDIA Corporation in the United States and other countries. All other product names, trademarks, and registered trademarks are the property of their respective owners.

References to third-party products or services, including NVIDIA BioNeMo, are for informational purposes only and do not constitute an endorsement or affiliation. This material is not sponsored or endorsed by NVIDIA Corporation. The information provided here is for general informational purposes and should not be interpreted as specific advice or a warranty of suitability for any particular use.

Use of NVIDIA BioNeMo and related technologies should comply with all relevant licensing terms, trademarks, and applicable regulations.

$${\color{orange}Installation}$$

Full step-by-step setup is in the Installation Guide. Quick path (assumes Databricks CLI authenticated to the target workspace as DEFAULT profile):

# 1. Deploy core first (UI, infrastructure, settings tables)
./deploy.sh core <aws|azure|gcp>

# 2. Deploy each module — one at a time, wait for jobs to finish before the next
./deploy.sh large_molecule  <aws|azure|gcp>
./deploy.sh single_cell     <aws|azure|gcp>
./deploy.sh small_molecule  <aws|azure|gcp>
./deploy.sh genomics        <aws|azure|gcp>
./deploy.sh bionemo         <aws|azure|gcp>   # optional, requires BioNeMo container build

For UI-only redeploys after the initial install (preserves all settings tables — never use deploy.sh core on a populated install):

cd modules/core
./update.sh <cloud> --ui-only   # rebuilds frontend, redeploys app, skips secret refresh / grants / UC volume copy

$${\color{orange}What's}$$ $${\color{orange}Included}$$

Genesis Workbench ships open models and open datasets across all modules. Models are registered as Databricks Model Serving endpoints (or run as batch jobs); datasets are downloaded/ingested once into Unity Catalog so the app has no runtime external-API dependency.

Models

Model Module / submodule Source Task
AlphaFold2 large_molecule / alphafold DeepMind AlphaFold2 v2.3.2 Protein 3D structure prediction (MSA + templates)
ESMFold large_molecule / esmfold facebook/esmfold_v1 Fast single-sequence structure prediction (no MSA)
Boltz-1 large_molecule / boltz boltz-community/boltz-1 Lightweight structure / co-folding prediction
ESM-2 Embeddings large_molecule / esm2_embeddings facebook/esm2_t33_650M_UR50D 650M mean-pooled protein embeddings (1280-dim) for similarity search
RFdiffusion large_molecule / rfdiffusion RosettaCommons/RFdiffusion De novo protein backbone design
ProteinMPNN large_molecule / protein_mpnn dauparas/ProteinMPNN Sequence design for a given backbone
DiffDock small_molecule / diffdock gcorso/DiffDock v1.1.3 Protein–ligand blind docking (diffusion)
GenMol small_molecule / genmol nvidia/NV-GenMol-89M-v2 Generative small-molecule design (SAFE masked diffusion)
Chemprop (BBBP / ClinTox / ADMET) small_molecule / chemprop chemprop==2.2.3 Blood-brain barrier, clinical toxicity, 10-property ADMET
NetSolP-1.0 small_molecule / netsolp tvinet/NetSolP-1.0 Protein solubility prediction
PLTNUM-ESM2 small_molecule / pltnum sagawa/PLTNUM-ESM2-NIH3T3 Protein half-life / stability
DeepSTABp small_molecule / deepstabp CSBiology/deepStabP (ProtT5-XL) Protein melting temperature (Tm)
MHCflurry 2.x small_molecule / mhcflurry openvax/mhcflurry MHC-I peptide presentation / immunogenicity
Proteina-Complexa (Binder / Ligand / AME) small_molecule / proteina_complexa NVIDIA-Digital-Bio/Proteina-Complexa Flow-matching binder design + motif scaffolding
scGPT (+ Perturbation) single_cell / scgpt bowang-lab/scGPT Single-cell foundation model; gene-perturbation prediction
TEDDY single_cell / teddy Merck/TEDDY Single-cell embedding foundation model
SCimilarity single_cell / scimilarity scimilarity==0.4.0 Single-cell embeddings + nearest-cell search
Rapids-SingleCell / Scanpy single_cell / rapidssinglecell, scanpy RAPIDS, Scanpy GPU/CPU single-cell QC, clustering, UMAP, DE, enrichment
BioNeMo ESM-2 (fine-tune / inference) bionemo NVIDIA BioNeMo ESM-2 fine-tuning + inference on custom assays

Datasets

All datasets below are open/public (UniProt CC-BY 4.0, PDB/ClinVar/1000G public domain, CC-BY 4.0 cell atlases, etc.).

Dataset Use is Vector Index
SwissProt reviewed human proteins (UniProt) → gene_sequences core — gene symbol → canonical sequence (Target Resolver) and human protein similarity search Yesgene_sequence_embedding_index (1280-dim, ESM-2)
UniRef90 FASTA (UniProt) → sequence_db large_molecule / sequence_search — protein similarity-search corpus Yessequence_embedding_index (1280-dim, ESM-2)
CellxGene Census scRNA-seq reference single_cell / scimilarity — cell-type semantic search corpus (~23M cells) Yesscimilarity_cell_index (128-dim)
ChEMBL target binders small_molecule / genmol — per-target active compounds (SMILES + pChEMBL) for target-aware generation → target_binders No
Broad Drug Repurposing Hub small_molecule / genmol — approved/clinical drugs (MoA, target, phase) → repurposing_hub No
ClinVar GRCh38 VCF (NCBI) genomics — clinical variant annotation → clinvar_variants No
ACMG SF v3.2 gene panel genomics — 81 medically-actionable genes for pathogenic-variant flagging No
GRCh38 reference genome (1000 Genomes/EBI) genomics — alignment + variant normalization No
1000 Genomes sample VCF (chr6) genomics — GWAS demo input No
MSK SPECTRUM HGSOC scRNA-seq (CellxGene) single_cell — ovarian-cancer demo dataset No
Adams et al. 2020 lung scRNA-seq (Zenodo) single_cell / scimilarity — IPF/healthy lung query sample No
Ensembl gene reference (BioMart) single_cell — Ensembl ID ↔ gene-symbol mapping No
Enrichr gene-set libraries (GMT) single_cell / scanpy — pathway enrichment No
AlphaFold genetic DBs — UniRef90, UniRef30, MGnify, small BFD, PDB70, PDB mmCIF, pdb_seqres, UniProt large_molecule / alphafold — MSA + template search for folding No

The three vector indexes are served from Databricks Vector Search: gene_sequence_embedding_index and sequence_embedding_index (protein similarity) plus scimilarity_cell_index (cell similarity). Protein search queries the human and UniRef indexes together so a single query returns both human and broad-organism hits.

$${\color{orange}Changelog}$$

See CHANGELOG.md for deployment fixes, known issues, and configuration notes.

$${\color{orange}Troubleshooting}$$

The repo ships Claude Code skills covering installation, deployment, troubleshooting, workflows, and development. These are designed to be loaded into Claude Code; they also serve as the canonical reference for common deployment failures and fixes.

$${\color{orange}License}$$

Please see LICENSE for the details of the license.

Some packages, tools, and code used inside individual tutorials are under their own licenses as described therein. Please ensure you read the details and licensing of individual tools. Other third party packages are used in tutorials within this accelerator and have their own licensing, as laid out in the table below.

We are adding a script to build your own Databricks compatible container for NVIDIA BioNeMo. If you want to use NVIDIA BioNeMo in Genesis Workbench, please follow the instructions to build the container and push the image to your container repository.

NVIDIA GPUs and cudatoolkit may be used in multiple places so you should consider the NVIDIA EULA(link) when using code in this package.

Module Package License Source
core fastapi==0.115.5 MIT https://github.com/tiangolo/fastapi
core uvicorn[standard]==0.32.1 BSD-3 https://github.com/encode/uvicorn
core react==19.x MIT https://github.com/facebook/react
core vite MIT https://github.com/vitejs/vite
core tailwindcss MIT https://github.com/tailwindlabs/tailwindcss
core tanstack/react-query MIT https://github.com/tanstack/query
core tanstack/react-table MIT https://github.com/tanstack/table
core zustand MIT https://github.com/pmndrs/zustand
core molstar MIT https://github.com/molstar/molstar
core plotly.js MIT https://github.com/plotly/plotly.js
core databricks-sdk==0.50.0 Apache2.0 https://pypi.org/project/databricks-sdk/
core databricks-sql-connector Apache2.0 https://github.com/databricks/databricks-sql-python
core py3Dmol==2.4.0 MIT TOBEREMOVED (https://pypi.org/project/py3Dmol/)
core biopython BioPython License Agreement https://github.com/biopython/biopython
core Mlflow Apache2.0 https://github.com/mlflow/mlflow
core plotly MIT https://github.com/plotly/plotly.py
core openai MIT https://github.com/openai/openai-python
core parasail BSD https://github.com/jeffdaily/parasail-python
core requests Apache2.0 https://github.com/psf/requests
core pandas BSD-3 https://github.com/pandas-dev/pandas
core numpy BSD-3 https://github.com/numpy/numpy
core rdkit BSD-3 https://github.com/rdkit/rdkit
scGPT numpy==1.26.4 BSD-3-Clause https://github.com/numpy/numpy
scGPT gdown==5.2.0 MIT https://github.com/wkentaro/gdown
scGPT wget==3.2 MIT / Public Domain https://pypi.org/project/wget/
scGPT ipython==8.15.0 BSD https://github.com/ipython/ipython
scGPT cloudpickle==2.2.1 BSD-3-Clause https://github.com/cloudpipe/cloudpickle
scGPT torch==2.0.1+cu118 BSD https://github.com/pytorch/pytorch
scGPT torchvision==0.15.2+cu118 BSD https://github.com/pytorch/vision
scGPT flash-attn==2.5.8 Apache-2.0 https://github.com/Dao-AILab/flash-attention
scGPT scgpt==0.2.4 MIT https://github.com/bowang-lab/scGPT
scGPT wandb==0.19.11 MIT https://github.com/wandb/wandb
SCimilarity SCimilarity v0.4.0 Apache2.0 https://github.com/Genentech/scimilarity
SCimilarity Mlflow Apache2.0 https://github.com/mlflow/mlflow
SCimilarity 'torch' / PyTorch BSD-3-Clause https://pypi.org/project/torch/2.7.1/
SCimilarity scanpy BSD 3-Clause https://github.com/scverse/scanpy
SCimilarity numcodecs MIT https://github.com/zarr-developers/numcodecs
SCimilarity tbb Apache2.0 https://github.com/uxlfoundation/oneTBB
SCimilarity typing_extensions PSF https://github.com/python/typing_extensions
SCimilarity numpy BSD https://github.com/numpy/numpy
SCimilarity pandas BSD 3-Clause https://github.com/pandas-dev/pandas
SCimilarity uv Apache2.0 https://github.com/astral-sh/uv
SCimilarity cloudpickle==2.0.0 BSD-3 https://github.com/cloudpipe/cloudpickle
RFDiffusion RFDiffusion BSD-3 https://github.com/RosettaCommons/RFdiffusion
RFDiffusion Hydra MIT https://github.com/facebookresearch/hydra
RFDiffusion OmegaConf BSD-3 https://github.com/omry/omegaconf
RFDiffusion Biopython BioPython License Agreement https://github.com/biopython/biopython
RFDiffusion DGL Apache2.0 https://github.com/dmlc/dgl
RFDiffusion pyrsistent MIT https://github.com/tobgu/pyrsistent
RFDiffusion e3nn MIT https://github.com/e3nn/e3nn
RFDiffusion Wandb MIT https://github.com/wandb/wandb
RFDiffusion Pynvml BSD-3 https://github.com/gpuopenanalytics/pynvml
RFDiffusion Decorator BSD-2 https://github.com/micheles/decorator
RFDiffusion Torch BSD-3 https://github.com/pytorch/pytorch
RFDiffusion Torchvision BSD-3 https://github.com/pytorch/vision
RFDiffusion torchaudio==0.11.0 BSD-2 https://github.com/pytorch/audio
RFDiffusion cloudpickle==2.2.1 BSD-3 https://github.com/cloudpipe/cloudpickle
RFDiffusion dllogger Apache2.0 https://github.com/NVIDIA/dllogger
RFDiffusion SE3Transformer MIT https://github.com/RosettaCommons/RFdiffusion/tree/main/env/SE3Transformer
RFDiffusion mlflow==2.15.1 Apache2.0 https://github.com/mlflow/mlflow
RFDiffusion MODEL WEIGHTS BSD https://github.com/RosettaCommons/RFdiffusion
ProteinMPNN ProteinMPNN MIT https://github.com/dauparas/ProteinMPNN
ProteinMPNN Numpy BSD-3 https://github.com/numpy/numpy
ProteinMPNN torch==1.11.0+cu113 BSD-3 https://github.com/pytorch/pytorch
ProteinMPNN torchvision==0.12.0+cu113 BSD-3 https://github.com/pytorch/vision
ProteinMPNN torchaudio==0.11.0 BSD-2 https://github.com/pytorch/audio
ProteinMPNN mlflow==2.15.1 Apache2.0 https://github.com/mlflow/mlflow
ProteinMPNN cloudpickle==2.2.1 BSD-3 https://github.com/cloudpipe/cloudpickle
ProteinMPNN biopython==1.79 BioPython License Agreement https://github.com/biopython/biopython
ProteinMPNN MODEL WEIGHTS MIT https://github.com/dauparas/ProteinMPNN
Alphafold AlphaFold (2.3.2) Apache2.0 https://github.com/google-deepmind/alphafold
Alphafold other dependencies we provide a file of requirements per alphafold's own repo, see yml file for further details
Alphafold MODEL WEIGHTS CC BY 4.0
ESMfold ESMFold MIT https://github.com/facebookresearch/esm
ESMfold torch BSD-3 https://github.com/pytorch/pytorch
ESMfold transformers Apache2.0 https://github.com/huggingface/transformers
ESMfold accelerate Apache2.0 https://github.com/huggingface/transformers
ESMfold hf_transfer==0.1.9 Apache2.0 https://github.com/huggingface/hf_transfer
ESMfold MODEL WEIGHTS MIT
Boltz-1 Boltz-1 MIT https://github.com/jwohlwend/boltz
Boltz-1 packaging Apache2.0 https://github.com/pypa/packaging
Boltz-1 ninja Apache2.0 https://github.com/scikit-build/ninja-python-distributions
Boltz-1 torch==2.3.1+cu121 BSD-3 https://github.com/pytorch/pytorch
Boltz-1 torchvision==0.18.1+cu121 BSD-3 https://github.com/pytorch/vision
Boltz-1 mlflow==2.15.1 Apache2.0 https://github.com/mlflow/mlflow
Boltz-1 cloudpickle==2.2.1 BSD-3 https://github.com/cloudpipe/cloudpickle
Boltz-1 requests>=2.25.1 Apache2.0 https://github.com/psf/requests
Boltz-1 boltz==0.4.0 MIT https://github.com/jwohlwend/boltz
Boltz-1 rdkit BSD-3 https://github.com/rdkit/rdkit
Boltz-1 absl-py==1.0.0 Apache2.0 https://github.com/abseil/abseil-py
Boltz-1 transformers>=4.41 Apache2.0 https://github.com/huggingface/transformers
Boltz-1 sentence-transformers>=2.7 Apache2.0 https://github.com/UKPLab/sentence-transformers/
Boltz-1 pyspark Apache2.0 https://github.com/apache/spark
Boltz-1 pandas BSD-3 https://github.com/pandas-dev/pandas
Boltz-1 flash_attn==1.0.9 (optional GPU) Apache2.0 https://github.com/Dao-AILab/flash-attention
Boltz-1 MODEL WEIGHTS MIT https://github.com/jwohlwend/boltz
Scanpy scanpy==1.11.4 BSD-3 https://github.com/scverse/scanpy
Scanpy anndata BSD-3 https://github.com/scverse/anndata
Scanpy scikit-network BSD-3 https://github.com/sknetwork-team/scikit-network
Scanpy pybiomart BSD-3 https://github.com/jrderuiter/pybiomart
Scanpy Numpy BSD-3 https://github.com/numpy/numpy
Scanpy pandas BSD-3 https://github.com/pandas-dev/pandas
Scanpy scipy BSD-3 https://github.com/scipy/scipy
rapids-singlecell (part of scverse) rapids-singlecell MIT https://github.com/scverse/rapids_singlecell
rapids-singlecell (part of scverse) cudf-cu12==25.10.* Apache-2.0 https://rapids.ai/
rapids-singlecell (part of scverse) cuml-cu12==25.10.* Apache-2.0 https://rapids.ai/
rapids-singlecell (part of scverse) cugraph-cu12==25.10.* Apache-2.0 https://rapids.ai/
rapids-singlecell (part of scverse) cucim-cu12==25.10.* Apache-2.0 https://rapids.ai/
rapids-singlecell (part of scverse) dask-cudf-cu12==25.10.* Apache-2.0 https://rapids.ai/
rapids-singlecell (part of scverse) nx-cugraph-cu12==25.10.* Apache-2.0 https://rapids.ai/
rapids-singlecell (part of scverse) cuxfilter-cu12==25.10.* Apache-2.0 https://rapids.ai/
rapids-singlecell (part of scverse) pylibraft-cu12==25.10.* Apache-2.0 https://rapids.ai/
rapids-singlecell (part of scverse) raft-dask-cu12==25.10.* Apache-2.0 https://rapids.ai/
rapids-singlecell (part of scverse) cuvs-cu12==25.10.* Apache-2.0 https://rapids.ai/
rapids-singlecell (part of scverse) cupy-cuda12x MIT https://github.com/cupy/cupy/
rapids-singlecell (part of scverse) scikit-learn==1.5.* BSD-3 https://github.com/scikit-learn/scikit-learn
rapids-singlecell (part of scverse) rmm (RAPIDS Memory Manager) Apache-2.0 https://github.com/rapidsai/rmm
Chemprop chemprop>=2.0.0 MIT https://github.com/chemprop/chemprop
Chemprop lightning Apache2.0 https://github.com/Lightning-AI/pytorch-lightning
Chemprop scikit-learn>=1.3 BSD-3 https://github.com/scikit-learn/scikit-learn
Chemprop PyTDC MIT https://github.com/mims-harvard/TDC
Chemprop torch>=2.0 BSD-3 https://github.com/pytorch/pytorch
Chemprop rdkit BSD-3 https://github.com/rdkit/rdkit
Chemprop mlflow>=2.15 Apache2.0 https://github.com/mlflow/mlflow
Chemprop cloudpickle BSD-3 https://github.com/cloudpipe/cloudpickle
DiffDock DiffDock MIT https://github.com/gcorso/DiffDock
DiffDock pyyaml==6.0.1 MIT https://github.com/yaml/pyyaml
DiffDock scipy==1.7.3 BSD-3 https://github.com/scipy/scipy
DiffDock networkx==2.6.3 BSD-3 https://github.com/networkx/networkx
DiffDock biopython==1.79 BioPython License Agreement https://github.com/biopython/biopython
DiffDock rdkit-pypi==2022.03.5 BSD-3 https://github.com/rdkit/rdkit
DiffDock e3nn==0.5.1 MIT https://github.com/e3nn/e3nn
DiffDock spyrmsd==0.5.2 MIT https://github.com/RMeli/spyrmsd
DiffDock biopandas==0.4.1 BSD-3 https://github.com/BioPandas/biopandas
DiffDock prody==2.6.1 MIT https://github.com/prody/ProDy
DiffDock fair-esm==2.0.0 MIT https://github.com/facebookresearch/esm
DiffDock torch-geometric==2.2.0 MIT https://github.com/pyg-team/pytorch_geometric
DiffDock torch-scatter==2.1.1 MIT https://github.com/rusty1s/pytorch_scatter
DiffDock torch-sparse==0.6.17 MIT https://github.com/rusty1s/pytorch_sparse
DiffDock torch-cluster==1.6.1 MIT https://github.com/rusty1s/pytorch_cluster
DiffDock pandas==1.5.3 BSD-3 https://github.com/pandas-dev/pandas
Proteina-Complexa Proteina-Complexa MIT https://github.com/NVIDIA-Digital-Bio/Proteina-Complexa
Proteina-Complexa torch==2.7.1 BSD-3 https://github.com/pytorch/pytorch
Proteina-Complexa lightning==2.6.1 Apache2.0 https://github.com/Lightning-AI/pytorch-lightning
Proteina-Complexa hydra-core==1.3.1 MIT https://github.com/facebookresearch/hydra
Proteina-Complexa omegaconf==2.3.0 BSD-3 https://github.com/omry/omegaconf
Proteina-Complexa torch_geometric==2.7.0 MIT https://github.com/pyg-team/pytorch_geometric
Proteina-Complexa torch_scatter==2.1.2 MIT https://github.com/rusty1s/pytorch_scatter
Proteina-Complexa torch_sparse==0.6.18 MIT https://github.com/rusty1s/pytorch_sparse
Proteina-Complexa torch_cluster==1.6.3 MIT https://github.com/rusty1s/pytorch_cluster
Proteina-Complexa biotite==1.6.0 BSD-3 https://github.com/biotite-dev/biotite
Proteina-Complexa loralib==0.1.2 MIT https://github.com/microsoft/LoRA
Proteina-Complexa einops==0.8.2 MIT https://github.com/arogozhnikov/einops
Proteina-Complexa transformers==5.5.0 Apache2.0 https://github.com/huggingface/transformers
Proteina-Complexa jaxtyping MIT https://github.com/patrick-kidger/jaxtyping
NetSolP NetSolP-1.0 BSD-3-Clause https://github.com/tvinet/NetSolP-1.0
NetSolP onnxruntime==1.20.1 MIT https://github.com/microsoft/onnxruntime
NetSolP fair-esm==2.0.0 MIT https://github.com/facebookresearch/esm
NetSolP torch==2.7.1 BSD-3 https://github.com/pytorch/pytorch
NetSolP numpy==1.26.4 BSD-3 https://github.com/numpy/numpy
NetSolP pandas==1.5.3 BSD-3 https://github.com/pandas-dev/pandas
NetSolP mlflow==2.22.0 Apache2.0 https://github.com/mlflow/mlflow
NetSolP cloudpickle==2.0.0 BSD-3 https://github.com/cloudpipe/cloudpickle
NetSolP databricks-sdk==0.50.0 Apache2.0 https://pypi.org/project/databricks-sdk/
NetSolP databricks-sql-connector==4.0.2 Apache2.0 https://github.com/databricks/databricks-sql-python
NetSolP MODEL WEIGHTS — Solubility_ESM12_0_quantized.onnx (~85 MB, split 0 of the upstream 5-fold ESM-12 ensemble) + ESM12_alphabet.pkl, committed under modules/small_molecule/netsolp/netsolp_v1/weights/ BSD-3-Clause https://services.healthtech.dtu.dk/services/NetSolP-1.0/
PLTNUM PLTNUM (vendored PLTNUM_PreTrainedModel class) MIT https://github.com/sagawatatsuya/PLTNUM
PLTNUM torch==2.7.1 BSD-3 https://github.com/pytorch/pytorch
PLTNUM transformers==4.46.3 Apache2.0 https://github.com/huggingface/transformers
PLTNUM safetensors==0.4.5 Apache2.0 https://github.com/huggingface/safetensors
PLTNUM huggingface-hub==0.26.2 Apache2.0 https://github.com/huggingface/huggingface_hub
PLTNUM numpy==1.26.4 BSD-3 https://github.com/numpy/numpy
PLTNUM pandas==1.5.3 BSD-3 https://github.com/pandas-dev/pandas
PLTNUM mlflow==2.22.0 Apache2.0 https://github.com/mlflow/mlflow
PLTNUM cloudpickle==2.0.0 BSD-3 https://github.com/cloudpipe/cloudpickle
PLTNUM databricks-sdk==0.50.0 Apache2.0 https://pypi.org/project/databricks-sdk/
PLTNUM databricks-sql-connector==4.0.2 Apache2.0 https://github.com/databricks/databricks-sql-python
PLTNUM MODEL WEIGHTS (HuggingFace sagawa/PLTNUM-ESM2-NIH3T3) MIT https://huggingface.co/sagawa/PLTNUM-ESM2-NIH3T3
PLTNUM ESM-2 650M backbone (facebook/esm2_t33_650M_UR50D) MIT https://github.com/facebookresearch/esm
DeepSTABp DeepSTABp (vendored deepSTAPpMLP class) MIT https://github.com/CSBiology/deepStabP
DeepSTABp torch==2.7.1 BSD-3 https://github.com/pytorch/pytorch
DeepSTABp transformers==4.46.3 Apache2.0 https://github.com/huggingface/transformers
DeepSTABp safetensors==0.4.5 Apache2.0 https://github.com/huggingface/safetensors
DeepSTABp huggingface-hub==0.26.2 Apache2.0 https://github.com/huggingface/huggingface_hub
DeepSTABp pytorch-lightning==2.5.5 Apache2.0 https://github.com/Lightning-AI/pytorch-lightning
DeepSTABp sentencepiece==0.2.0 Apache2.0 https://github.com/google/sentencepiece
DeepSTABp biopython==1.84 BioPython License Agreement https://github.com/biopython/biopython
DeepSTABp numpy==1.26.4 BSD-3 https://github.com/numpy/numpy
DeepSTABp pandas==1.5.3 BSD-3 https://github.com/pandas-dev/pandas
DeepSTABp mlflow==2.22.0 Apache2.0 https://github.com/mlflow/mlflow
DeepSTABp cloudpickle==2.0.0 BSD-3 https://github.com/cloudpipe/cloudpickle
DeepSTABp databricks-sdk==0.50.0 Apache2.0 https://pypi.org/project/databricks-sdk/
DeepSTABp databricks-sql-connector==4.0.2 Apache2.0 https://github.com/databricks/databricks-sql-python
DeepSTABp MODEL WEIGHTS — MLP head (~80 MB, fetched from upstream raw URL at registration time) MIT https://github.com/CSBiology/deepStabP/raw/main/src/Api/trained_model/b25_sampled_10k_tuned_2_d01/checkpoints/
DeepSTABp ProtT5-XL backbone (Rostlab/prot_t5_xl_uniref50) MIT (verified at parent ProtTrans repo) https://github.com/agemagician/ProtTrans
MHCflurry mhcflurry==2.2.1 Apache2.0 https://github.com/openvax/mhcflurry
MHCflurry torch==2.7.1 BSD-3 https://github.com/pytorch/pytorch
MHCflurry numpy==1.26.4 BSD-3 https://github.com/numpy/numpy
MHCflurry pandas==2.2.3 BSD-3 https://github.com/pandas-dev/pandas
MHCflurry scikit-learn==1.5.2 BSD-3 https://github.com/scikit-learn/scikit-learn
MHCflurry biopython==1.84 BioPython License Agreement https://github.com/biopython/biopython
MHCflurry mlflow==2.22.0 Apache2.0 https://github.com/mlflow/mlflow
MHCflurry cloudpickle==2.0.0 BSD-3 https://github.com/cloudpipe/cloudpickle
MHCflurry databricks-sdk==0.50.0 Apache2.0 https://pypi.org/project/databricks-sdk/
MHCflurry databricks-sql-connector==4.0.2 Apache2.0 https://github.com/databricks/databricks-sql-python
MHCflurry MODEL WEIGHTS (auto-fetched via mhcflurry-downloads fetch models_class1_presentation, ~150 MB) Apache2.0 https://github.com/openvax/mhcflurry
Genomics glow Apache2.0 https://github.com/projectglow/glow
Genomics pyspark Apache2.0 https://github.com/apache/spark
BioNeMo six==1.16.0 MIT https://github.com/benjaminp/six
BioNeMo numpy==1.26.4 BSD-3 https://github.com/numpy/numpy
BioNeMo pandas==2.2.3 BSD-3 https://github.com/pandas-dev/pandas
BioNeMo pyarrow>=14.0.0 Apache2.0 https://github.com/apache/arrow
BioNeMo matplotlib>=3.8.0 PSF/BSD https://github.com/matplotlib/matplotlib
BioNeMo Jinja2>=3.1.2 BSD-3 https://github.com/pallets/jinja
BioNeMo protobuf>=4.23.3 BSD-3 https://github.com/protocolbuffers/protobuf
BioNeMo grpcio>=1.59.0 Apache2.0 https://github.com/grpc/grpc
BioNeMo grpcio-status>=1.59.0 Apache2.0 https://github.com/grpc/grpc
BioNeMo databricks-sdk>=0.1.6 Apache2.0 https://pypi.org/project/databricks-sdk/
BioNeMo psutil BSD-2 https://github.com/giampaolo/psutil
BioNeMo pynvml BSD-3 https://github.com/gpuopenanalytics/pynvml
Parabricks see BioNeMo (same docker base dependencies)
ESM2 Embeddings torch==2.3.1 BSD-3 https://github.com/pytorch/pytorch
ESM2 Embeddings transformers==4.41.2 Apache2.0 https://github.com/huggingface/transformers

About

No description, website, or topics provided.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors