Skip to content

fix: LBP pipeline broken due to wrong Chai-1 FASTA generator#6

Open
Xinping-Liu wants to merge 1 commit intoOTeam-AI4S:mainfrom
Xinping-Liu:fix/lbp-chai1-fasta
Open

fix: LBP pipeline broken due to wrong Chai-1 FASTA generator#6
Xinping-Liu wants to merge 1 commit intoOTeam-AI4S:mainfrom
Xinping-Liu:fix/lbp-chai1-fasta

Conversation

@Xinping-Liu
Copy link
Copy Markdown

Problem

The LBP (and interface) pipeline failed silently at the refold_prepare stage. _prepare_refold_chai1 called make_chai1_fasta_multi_process, which is AME-specific: it looks up task names from an AME CSV and reads pre-built ligand SMILES from assets/ame/standard_nonprotein_fasta/. LBP has no AME CSV, so all FASTA files failed to generate with "Failed to get task name" warnings, and Chai-1 received zero inputs.

Fix

Add make_chai1_fasta_from_backbone_dir to ReFold, a general-purpose method that:

  • Reads ligand residue codes directly from HETATM records in each backbone PDB
  • Looks up SMILES from the CCD components.cif database (no AME CSV required)
  • Reads LigandMPNN-generated protein sequences from inverse_fold/seqs/

Update _prepare_refold_chai1 in pipeline_framework.py to use the new method for LBP/interface tasks. AME continues to use make_chai1_fasta_multi_process unchanged.

Test

Ran end-to-end on 3 LBP backbone structures (× 8 sequences = 24 designs):

  • refold_prepare: 24/24 FASTA files generated ✓
  • Chai-1 refold: 24/24 structures predicted ✓
  • Evaluation metrics (pLDDT, ipAE, ipTM) computed and written to raw_data.csv

Files changed

  • refold/refold_api.py: add make_chai1_fasta_from_backbone_dir method
  • scripts/pipeline_framework.py: update _prepare_refold_chai1 to use new method

The LBP pipeline was broken because _prepare_refold_chai1 called
make_chai1_fasta_multi_process, which is AME-specific and requires an
AME CSV to look up task names and pre-built ligand SMILES files.
LBP has no AME CSV, so all 24 FASTA files failed to generate and
Chai-1 received zero inputs.

Fix by adding make_chai1_fasta_from_backbone_dir to ReFold, which:
- reads ligand residue names directly from backbone PDB HETATM records
- looks up SMILES from the CCD components.cif database
- reads LigandMPNN-generated sequences from inverse_fold/seqs/

Update _prepare_refold_chai1 in pipeline_framework.py to use the new
method for LBP and interface tasks.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant