-
Notifications
You must be signed in to change notification settings - Fork 90
tests: query_colabfold_msa_server #145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jandom
wants to merge
22
commits into
main
Choose a base branch
from
jandom/2026-02/tests/query_colabfold_msa_server
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
561018c
fix #101: template chain alignment
jandom 0787889
further tweak, and maybe working now
jandom c3c14a5
rename templates
jandom 080d0d0
run a linter
jandom e57d303
review: comments and improvements
jandom 865ec86
simpler code, happier
jandom b29f1f6
Add remapping logic to the colabfold pipeline and remove from templat…
gnikolenyi a78196f
refactor the PR slightly
jandom be1fc4e
Merge branch 'public-main' into jandom/2026-02/fix/chain-template-ali…
jandom 6f7d487
fix the test_colabfold_msa
jandom 126bcca
fix: TEST_DIR location
jandom 11420df
mutualize RSCB API calls and add tests
jandom 65c292f
use the new rscb.py module in colabfold_msa_server
jandom fbc389a
migrate all testst to test_colabfold_msa_server
jandom 8450fa8
remove dead code
jandom 4a26bc6
zip alignments and a3m_lines with strict=True
jandom 78570ff
move test files to test_data
jandom ccf9d77
tests: query_colabfold_msa_server
jandom 3fa1d74
there can be only one positional argument
jandom 5dcb197
some mypy dust on the query_colabfold_msa_server
jandom 470fc69
Merge branch 'main' into jandom/2026-02/tests/query_colabfold_msa_server
jandom 42e40c3
Merge branch 'main' into jandom/2026-02/tests/query_colabfold_msa_server
jandom File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
75 changes: 75 additions & 0 deletions
75
openfold3/tests/core/data/pipelines/preprocessing/test_template.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,75 @@ | ||
| from pathlib import Path | ||
|
|
||
| import openfold3 | ||
| from openfold3.core.data.io.sequence.template import ( | ||
| A3mParser, | ||
| parse_template_alignment, | ||
| ) | ||
| from openfold3.core.data.io.structure.cif import _load_ciffile | ||
| from openfold3.core.data.primitives.structure.metadata import ( | ||
| get_asym_id_to_canonical_seq_dict, | ||
| get_author_to_label_chain_ids, | ||
| ) | ||
|
|
||
| _TEST_DATA_DIR = Path(openfold3.__file__).parent / "tests" / "test_data" | ||
|
|
||
|
|
||
| class TestTemplatePreprocessor: | ||
| def test_template_has_author_chain_id(self): | ||
| """Verify author->label chain ID resolution for 1RNB. | ||
|
|
||
| https://github.com/aqlaboratory/openfold-3/issues/101 | ||
|
|
||
| In 1RNB, author chain "A" is label chain "B" (the protein barnase). | ||
| The ColabFold alignment reports "1rnb_A" which must be resolved to | ||
| label chain "B" before the sequence can be looked up. | ||
| """ | ||
| alignment_file = ( | ||
| _TEST_DATA_DIR / "template_alignments" / "colabfold_template.m8" | ||
| ) | ||
| query_seq_str = "AQVINTFDGVADYLQTYHKLPDNYITKSEAQALGWVASKGNLADVAPGKSIGGDIFSNREGKLPGKSGRTWREADINYTSGFRNSDRILYSSDWLIYKTTDHYQTFTKIR" | ||
| templates = parse_template_alignment( | ||
| aln_path=Path(alignment_file), | ||
| query_seq_str=query_seq_str, | ||
| max_sequences=200, | ||
| ) | ||
|
|
||
| # find the offending "1rnb_A" | ||
| template = templates[16] | ||
| assert template.chain_id == "A" and template.entry_id == "1rnb" | ||
|
|
||
| template_structure_file = _TEST_DATA_DIR / "mmcifs" / f"{template.entry_id}.cif" | ||
| cif_file = _load_ciffile(template_structure_file) | ||
|
|
||
| chain_id_seq_map = get_asym_id_to_canonical_seq_dict(cif_file) | ||
| poly_scheme = cif_file.block["pdbx_poly_seq_scheme"] | ||
| label_to_author = dict( | ||
| zip( | ||
| poly_scheme["asym_id"].as_array().tolist(), | ||
| poly_scheme["pdb_strand_id"].as_array().tolist(), | ||
| strict=True, | ||
| ) | ||
| ) | ||
| author_to_label_chain_ids = get_author_to_label_chain_ids(label_to_author) | ||
| label_chain_id = author_to_label_chain_ids[template.chain_id][0] | ||
|
|
||
| # Author "A" -> label "B" (the protein chain) | ||
| assert label_chain_id == "B" | ||
|
|
||
| template_sequence = chain_id_seq_map.get(label_chain_id) | ||
|
|
||
| parser = A3mParser(max_sequences=None) | ||
| parsed = parser( | ||
| ( | ||
| f">query_X/1-{len(query_seq_str)}\n" | ||
| f"{query_seq_str}\n" | ||
| f">{template.entry_id}_{label_chain_id}/{1}-{len(template_sequence)}\n" | ||
| f"{template_sequence}\n" | ||
| ), | ||
| query_seq_str, | ||
| realign=True, | ||
| ) | ||
|
|
||
| assert len(parsed) == 2 | ||
| assert parsed[0].seq_id == 1 | ||
| assert parsed[1].seq_id < 1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bet we could reuse the ColabFold mmseqs mock itself :
Mock definition: https://github.com/sokrypton/ColabFold/blob/main/tests/mock.py#L109
Usage: https://github.com/sokrypton/ColabFold/blob/9712f2ff262d3977d571919317e06cc96c29cd95/tests/test_msa.py#L7
That would probably be the cleanest way to handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would have to vendor it though :/ Not ideal – ColabFold is a very bulk dep to add, i'd rather avoid it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've also modified their code a fair bit, i think
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, there @jnwei let me re-vive this PR – I think we can do the pytest-VCR approach to these tests
Ah, script that – we need the full colabfold job to complete. They appear to be using some sort of caching, subsequent calls are instant basically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just reviewed the mock code you provided, yeah let's vendor that – it'll be the easiest.