Add BBKNN (TS) method. #84

cmclean · 2025-09-13T12:19:42Z

Describe your changes

This PR adds the top-performing "BBKNN (TS)" method from our recent preprint [1]. I have verified that tests pass, and running the run_full_local.sh script on my own machine successfully recapitulates the numbers published in the preprint.

[1] Aygun et al, An AI system to help scientists write expert-level empirical software, arXiv:2509.06503 (2025), https://arxiv.org/abs/2509.06503 .

Checklist before requesting a review

I have performed a self-review of my code
Check the correct box. Does this PR contain:
- Breaking changes
- New functionality
- Major changes
- Minor changes
- Bug fixes
Proposed changes are described in the CHANGELOG.md
CI Tests succeed and look good!

cmclean · 2025-09-15T19:35:37Z

I have a couple questions about the submission and testing:

Should my new 'bbknn_ts' method also be registered into the following two places?

methods = [...] block of src/workflows/run_benchmark/main.nf (lines 10-46)
dependencies: block of src/workflows/run_benchmark/config.vsh.yaml (lines 79-125)

(For my own edification) When I ran run_full_local.sh, the method completed successfully on all datasets and metrics except computing kbet on the hypomap dataset, where it OOMed. That was expected since it OOMed there on our internal evaluation harness, and many methods fail on that particular (dataset, method). To get the final score_uns.yaml file though I had to rerun while dropping the kbet metric (for all datasets). Is there a way to skip just a specified set of (dataset, method) jobs or ignore errors and finish the workflow just with what was computed properly? As is I was not able to validate identical kbet performance metrics, though given the identical results for all other metrics on all datasets am confident in its reproducibility.

Thanks!

lazappi · 2025-09-17T12:55:20Z

1. Should my new 'bbknn_ts' method also be registered into the following two places?

* `methods = [...]` block of src/workflows/run_benchmark/main.nf (lines 10-46)
* `dependencies:` block of src/workflows/run_benchmark/config.vsh.yaml (lines 79-125)

Yes, it needs to be added there to be included in the workflow. It sounds like you ran the workflow already but I'm not sure that it would include the new method without doing this.

2. (For my own edification) When I ran `run_full_local.sh`, the method completed successfully on all datasets and metrics except computing `kbet` on the `hypomap` dataset, where it OOMed. That was expected since it OOMed there on our internal evaluation harness, and many methods fail on that particular (dataset, method). To get the final `score_uns.yaml` file though I had to rerun while dropping the `kbet` metric (for all datasets). Is there a way to skip just a specified set of (dataset, method) jobs or ignore errors and finish the workflow just with what was computed properly? As is I was not able to validate identical `kbet` performance metrics, though given the identical results for all other metrics on all datasets am confident in its reproducibility.

When we do the full benchmark run on the cloud any failed metrics are ignored (or more accurately given a score of zero). We don't usually do the full runs locally so there might be some differences in the settings that causes it to not produce an output. Generally we wouldn't want to disable a metric just for a specific dataset/method.

cmclean · 2025-09-19T19:02:56Z

Thank you for the responses. I noticed that the drvi method addition (#61) did not add to the src/workflows/run_benchmark/ files so didn't put bbknn_ts there but will in a follow-up commit in this PR.

Regarding my other question, this was ultimately just a vanilla NextFlow question, I've now figured out how to pipe errorStrategy 'ignore' into the right place. Thank you!

cmclean · 2025-09-22T14:44:29Z

Okay this PR is ready for review. @mumichae please note that src/methods/bbknn_ts/script.py lines 18-226 does not require detailed review comments, this is an entirely LLM-generated function that we do not want to modify.

src/methods/bbknn_ts/config.vsh.yaml

mumichae · 2025-09-25T09:53:01Z

src/methods/bbknn_ts/script.py

Based on a quick glance, this code will likely work as the developer intended, however it does not make use of the pre-computed processing step and recomputes everything by itself insead. This in itself won't cause the code to fail or give wrong results, however it won't follow the benchmark setup as intended

Can you elaborate a little on the comment that "it won't follow the benchmark setup as intended"? Is that because we indicate this is an embedding method with preferred normalization of log_cp10k but use the "layers/counts" as input rather than "layers/normalized"?

Would it be considered to follow the benchmark setup as intended if we indicated it is a [feature] method and set adata_integrated.layers['corrected_counts'] = adata_integrated.X? IIUC that could only improve its overall score on the v2.0.0 benchmark as then the HVG metric would also be computed.

I'd rather not edit the code to do that since right now it's purely LLM implemented, just want to make sure we're not missing something more fundamental. Thanks!

For the benchmark we don't consider the preprocessing steps (count transformations, feature selection) as part of the integration method, and these operations are precomputed and made available for the integration components. But I'm considering relaxing the logic for this method, if we want to preserve the LLM code and tuning preprocessing parameters is not what the LLM code intends.

CHANGELOG.md

…method

cmclean · 2025-10-14T15:22:46Z

Friendly ping -- do you need anything else prior to re-reviewing this PR @mumichae? Thank you for your help.

mumichae · 2025-10-31T14:12:22Z

src/methods/bbknn_ts/config.vsh.yaml

+name: bbknn_ts
+label: BBKNN (TS)


The method name is a bit misleading, since the main feature of this approach is to combine Combat and BBKNN. I think a name like gemini_combat_bbknn could make this distinction clearer

mumichae · 2025-10-31T14:13:37Z

src/workflows/run_benchmark/config.vsh.yaml

  - name: methods/batchelor_fastmnn
  - name: methods/batchelor_mnn_correct
  - name: methods/bbknn
+  - name: methods/bbknn_ts


name would need to be updated (see above)

mumichae · 2025-10-31T14:13:41Z

src/workflows/run_benchmark/main.nf

  batchelor_fastmnn,
  batchelor_mnn_correct,
  bbknn,
+  bbknn_ts,


name would need to be updated (see above)

mumichae · 2025-10-31T14:19:27Z

Hi @cmclean, apologies for my late reply, I've been swamped with events the last couple weeks. I had a look at the prompt and the code in more detail, and found that it suggest a rather unconventional workflow, since you're running 2 integration benchmarks back-to-back. But since the point of openproblems is to benchmark novel approaches, so it's definitely interesting to include this approach.

I added some clarification on the relationship between preprocessing and integration here, but ultimately decided to keep the LLM approach as its own end-to-end workflow, where preprocessing steps aren't tunable

mumichae

Review on the output type of the method to be consistent with the evaluation workflow (i.e. the correct integrated representation gets evaluated)

mumichae · 2025-10-31T14:20:50Z

src/methods/bbknn_ts/config.vsh.yaml

+  repository: https://github.com/google-research/score
+
+info:
+  method_types: [embedding]  


Since BBKNN only modifies the KNN graph only, the ouput type should be knn, not embedding

Suggested change

method_types: [embedding]

method_types: [knn]

If you also want to consider the Combat-corrected counts as additional part of the method output and configure, both outputs as folllows:

Suggested change

method_types: [embedding]

method_types: [full, knn]

the evaluation pipeline considers it as a separate version of the method and evaluate it separately.
So there will be an evaluation of bbknn_ts:full and bbknn_ts:knn, but there won't be any combination of metrics from the Combat corrected counts and BBKNN in a single entry in the results table.

Either way, embedding isn't the correct output format of this approach, because the resulting KNN graph from the embedding will be used for knn-based metrics, not the BBKNN corrected graph

mumichae · 2025-10-31T14:35:51Z

src/methods/bbknn_ts/config.vsh.yaml

+label: BBKNN (TS)
+summary: "A combination of ComBat and BBKNN discovered and implemented by Gemini."
+description: |
+  "The BBKNN (TS) solution applies standard scRNA-seq preprocessing steps, including total count normalization, log-transformation, and scaling of gene expression data. Batch effect correction is performed using scanpy.pp.combat directly on the gene expression matrix (before dimensionality reduction). Dimensionality reduction is then applied using PCA on the ComBat-corrected data, and this PCA embedding (adata.obsm['X_pca']) is designated as the integrated embedding (adata.obsm['X_emb']). A custom batch-aware nearest neighbors graph is constructed based on this integrated embedding; for each cell, neighbors are independently identified within its own batch and other batches, up to n_neighbors_per_batch. These candidate neighbors are merged, keeping the minimum distance for duplicate entries, and the top total_k_neighbors are selected for each cell. Finally, a symmetric sparse distance matrix and a binary connectivities matrix are generated to represent the integrated neighborhood graph. This code was entirely written by the AI system described in the associated publication."


For feature outputs, the transformation to PCA for the embedding representation is part of the postprocessing of the integration output, so that part of the prompt is not applicable to this approach. Since the final representation is a corrected KNN graph, the task should be considered as a knn output type, not embedding (see method types section of this config)

Added BBKNN (TS) method.

01c061a

lazappi requested a review from mumichae September 17, 2025 12:55

cmclean added 2 commits September 19, 2025 19:38

Add bbknn_ts to run_benchmark methods

8de3ea1

Add PR# to BBKNN (TS) method addition log

c048c0f

mumichae requested changes Sep 25, 2025

View reviewed changes

cmclean added 2 commits September 28, 2025 15:50

Clean up bbknn_ts YAML and CHANGELOG.md for readability.

b04bdf7

Merge branch 'main' into feature/task_batch_integration/add-bbknn_ts-…

a117d68

…method

cmclean requested a review from mumichae September 29, 2025 15:13

mumichae requested changes Oct 31, 2025

View reviewed changes

Add BBKNN (TS) method. #84

Are you sure you want to change the base?

Add BBKNN (TS) method. #84

Uh oh!

Conversation

cmclean commented Sep 13, 2025

Describe your changes

Checklist before requesting a review

Uh oh!

cmclean commented Sep 15, 2025

Uh oh!

lazappi commented Sep 17, 2025

Uh oh!

cmclean commented Sep 19, 2025

Uh oh!

cmclean commented Sep 22, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cmclean commented Oct 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mumichae commented Oct 31, 2025

Uh oh!

mumichae left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants