[REVIEW] Add OpenSearch backend to ``cuvs-bench`` by jrbourbeau · Pull Request #2012 · NVIDIA/cuvs

jrbourbeau · 2026-04-10T18:30:27Z

Would still like to refine a bit but pushing up for early feedback

cc @singhmanas1 @janakivamaraju @afourniernv

xref #1907 which does something similar but for Elastic

EDIT: Okay, this is now ready for review (cc @jnke2016)

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

jrbourbeau · 2026-04-10T18:59:59Z


+[tool.pytest.ini_options]
+markers = [
+    "integration: tests that require a live OpenSearch node (run with '-m integration')",


TODO: Make this more generic language (not just opensearch)

You can do this in a follow-up, but I think it would be a good idea to add a blurb to the cuvs-bench install documentation for running the integration tests. Maybe we could do that in the opensearch deployment PR.

I'd also say "integration" is a bit overly-general of a name if you intend to add more backends in the future. -m opensearch (or something similarly specific) would be a bit more future-proof.

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

…ch-opensearch Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

…ch-opensearch Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

jnke2016

The PR looks good me.

cjnolet · 2026-04-20T18:52:56Z

+
+    ext = os.path.splitext(path)[1].lower()
+    dtype = _DTYPE_FOR_EXT.get(ext, np.float32)
+    with open(path, "rb") as f:


I mentioned this in the elasticsearch PR as well- we should standardize/centralize these calls so that we're not having to manually implement them for every backend. The benchmark should be opening the files and passing the vectors or the file handle) down to the actual backends.

@jnke2016

Don't disagree with this comment (or the other ones about consolidating various utilities). IIUC it sounds like these will need deeper changes to the config loader / orchestrator specs. Is that something that's needed in this PR, or is a follow up to refactor fine?

Since both backends(elastic search and OpenSearch) are separately implementing this, a follow up PR consolidating both might be easier especially since both PRs are still open. Otherwise I can push a PR consolidating those various utilities (general purpose) based on both PRs which they can pull after it is merged

@jrbourbeau
Loading vectors in the orchestrator or config loader isn't the right approach because the C++ backend never loads vectors into Python. It only passes file paths to the C++ subprocess. Loading vectors at the orchestrator level would waste memory for the C++ backend and any future backend that operates the same way, so this doesn't require changes to the config loader or orchestrator specs.

There are two options to centralize this:

Option 1: Shared utility module. Create a backends/utils.py with a single load_vectors() function that handles all supported binary formats (.fbin, .f16bin, .u8bin, .i8bin, .ibin) and supports subset_size. Python-native backends like OpenSearch and Elasticsearch import and call it when they need vectors. The C++ backend, orchestrator, and config loader are completely untouched. This is the simpler approach with no core changes, though it relies on future backend authors knowing to use it.

Option 2: Lazy loading on Dataset. Make base_vectors, query_vectors, and groundtruth_neighbors lazy properties on the Dataset class that load from the corresponding file paths on first access. The C++ backend never accesses these properties (it uses file paths), so nothing loads. Python-native backends just access dataset.base_vectors and get loaded vectors automatically without needing to know about any utility or handle the empty-array fallback pattern. This prevents the duplication from recurring since future backend authors can't accidentally reimplement it, but it requires changing Dataset from a plain dataclass to a regular class.

I'd suggest we go with option 1 for now since it solves the problem without touching core types, and we can move to option 2 later if the pattern keeps getting reimplemented

Went with option 2. The Dataset class is now a regular class where base_vectors, query_vectors, and groundtruth_neighbors are loaded automatically from file paths on first access. Backends just use dataset.base_vectors and get a numpy array without needing to know about file loading. The C++ backend is unaffected since it only uses dataset.base_file.

cjnolet · 2026-04-20T18:53:23Z

+    return data.reshape(n_rows, n_cols)
+
+
+def _load_yaml(path: str) -> Any:


This and the dataset config should also be getting loaded by the benchmark itself and not the backend. @jnke2016

cjnolet · 2026-04-20T18:53:40Z

+
+
+def _gather_algo_configs(
+    config_path: str, algorithm_configuration: Optional[str]


We need to centralize this @jnke2016

Agreed. Functions like _gather_algo_configs, _load_yaml, and _get_dataset_conf are shared config-loading logic that every backend ends up reimplementing. @afourniernv - The Elasticsearch PR has the same pattern. The C++ loader already has equivalent methods (load_yaml_file, get_dataset_configuration, gather_algorithm_configs) as instance methods. I'll promote those to the base ConfigLoader class so all config loaders inherit them automatically and developers don't need to reimplement them. The backend should only be responsible for its backend-specific concerns (connection parameters, index mappings, API calls), not for loading YAML files or finding datasets by name.

Though the load() method is abstract because each backend has different config-loading logic, helper methods like load_yaml_file and get_dataset_configuration are common operations that every config loader reimplements when implementing load(). gather_algorithm_configs is also shared between the C++ and OpenSearch loaders.

cjnolet · 2026-04-20T18:55:50Z

+
+        benchmark_configs: List[BenchmarkConfig] = []
+
+        for algo_file in algo_files:


Yeah, this is all boilerplate- exactly why this should be centralized. The resulting config dicts should be loaded from the files by the benchmark and passed down to the individual backends. It's important that this is standardized across benchmarks so that we don't have specific backends doing things drastically different than others (that just makes it super complicated for everyone. But also, if we find a bug in a spcific loader or decide to change how we do things generally, every single backend then needs to be updated individually instead of just changing it one place ) @jnke2016

So the loading pipeline (loading dataset YAML, gathering algorithm files, filtering by algorithm/group names etc ..) is a shared logic that should be centralized in the base ConfigLoader instead of having each backend implementing this.

cjnolet · 2026-04-20T18:59:24Z

+                method_params = {"nlist": build_param.get("nlist", 4)}
+            else:
+                raise ValueError(
+                    f"Unsupported faiss_method {faiss_method!r}. "


Yes! This is important. I believe i suggested that this be done in the eladticserch PR as well. Need to fail gracefully when the user configures somethign that's not supported (or loads the wrong file).

cjnolet · 2026-04-20T19:03:38Z

cc @navneet1v for thoughts / feedback on the opensearch backend for cuvs-bench.

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

…ch-opensearch

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

jnke2016 · 2026-04-27T09:31:15Z

+            elapsed = time.perf_counter() - t0
+            qps = n_queries / elapsed if elapsed > 0 else 0.0
+
+            recall = 0.0


Recall calculation: I've added a shared compute_recall() in backends/utils.py so that any bug fix or improvement happens in one place and applies to all Python-native backends. The inline recall block at lines 1027-1035 would be replaced with:

from .utils import compute_recall recall = 0.0 if groundtruth is not None: recall = compute_recall(neighbors, groundtruth, k)

Running the parameters and calculating recall should be done by the orchestrator not the individual backends. The orchestrator should be controlling this end to end. Backends should not even have the option to do this because the orchestrator should be the one controlling it all.

Please look at how Ann-benchmarks is architected. That's what we want to do here (and that's what the original cuVS-bench scripts were doing.

The original cuVS-bench scripts had the framework control everything end-to-end, with recall computed by the C++ subprocess. This is still the case in the new pluggable benchmark API for the C++ backend. Python-native backends (OpenSearch, Elasticsearch) don't have this since they return raw neighbor IDs over HTTP. Moving recall computation to the orchestrator would require handling the case where the C++ backend already returns recall, to avoid overwriting it. One way to distinguish this is by checking neighbors.size > 0, but that relies on the C++ backend returning empty neighbor arrays rather than an explicit signal, and if any other backend also returns empty neighbors for a different reason, the orchestrator would wrongly skip recall computation. We could also hardcode an exception for the C++ backend in the orchestrator, but that goes against the pluggable design since any future backend that also computes recall internally would need another exception.

I looked at how ann-benchmarks handles this. In ann-benchmarks, every algorithm returns only raw neighbor indices. The framework stores them as-is, and recall is computed downstream during analysis, never by the algorithm itself as you pointed out. In our case, the C++ backend is the exception because its subprocess design doesn't return raw neighbors to Python. It computes recall internally and returns it in the JSON output. Following the ann-benchmarks pattern, the orchestrator would compute recall for any backend that returns actual neighbor arrays (neighbors.size > 0), and preserve the C++ backend's recall when it returns empty neighbors. This was the main reason I initially kept recall in the backends rather than the orchestrator. Relying on neighbors.size > 0 to skip the C++ backend is fragile if other backends also return empty neighbors for different reasons. I will have recall computed at the orchestrator level and skipping it for the C++ backend (through neighbors.size > 0) and I am open to suggestions.

recall is now computed by the orchestrator after backend.search() returns. Backends no longer compute it themselves. The C++ backend is naturally skipped since it returns empty neighbor arrays (recall is already computed by the subprocess).

Updated to have the orchestrator compute recall

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com> # Conflicts: # python/cuvs_bench/cuvs_bench/orchestrator/orchestrator.py

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

jrbourbeau

Okay, I've updated the opensearch backend here to use the new centralized utilities / updated structure from #2040 and #2084

@jnke2016 @cjnolet could you take a look when you get a moment?

cc @singhmanas1 for viz

coderabbitai · 2026-05-15T19:43:48Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6f454534-da49-435f-8218-5672c32f227c

📥 Commits

Reviewing files that changed from the base of the PR and between 219058d and 0974219.

📒 Files selected for processing (9)

conda/environments/bench_ann_cuda-129_arch-aarch64.yaml
conda/environments/bench_ann_cuda-129_arch-x86_64.yaml
conda/environments/bench_ann_cuda-132_arch-aarch64.yaml
conda/environments/bench_ann_cuda-132_arch-x86_64.yaml
conda/recipes/cuvs-bench-cpu/recipe.yaml
conda/recipes/cuvs-bench/recipe.yaml
dependencies.yaml
python/cuvs_bench/cuvs_bench/tests/test_opensearch.py
python/cuvs_bench/pyproject.toml

🚧 Files skipped from review as they are similar to previous changes (2)

python/cuvs_bench/pyproject.toml
python/cuvs_bench/cuvs_bench/tests/test_opensearch.py

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added OpenSearch backend with FAISS HNSW support, configurable remote index build, and a bundled benchmark configuration and hyperparameter search space.
Tests
- Added comprehensive unit and gated integration tests for config loading, build/search flows, remote-build scenarios, and failure paths.
Chores
- Added optional OpenSearch dependency and pytest marker; improved backend lifecycle handling and auto-registered the new config loader.

Walkthrough

Adds an OpenSearch k-NN backend and config loader, registers them, provides a FAISS-HNSW benchmark and search-space entry, updates orchestrator lifecycle/recall logic, includes unit and gated integration tests (remote index-build paths), and declares optional opensearch dependency plus pytest marker.

Changes

OpenSearch Backend Integration

Layer / File(s)	Summary
OpenSearch Backend Core Implementation `python/cuvs_bench/cuvs_bench/backends/opensearch.py`	`OpenSearchBackend` class implements index build and kNN search with lazy client initialization, lifecycle methods, index mapping creation for lucene/faiss engines, streaming bulk indexing, remote GPU build monitoring via kNN stats polling, and end-to-end build/search with parameter sweeping, metric aggregation, and timing capture. Constants and space-type mappings for distance metrics are included.
Config Loader and Backend Registration `python/cuvs_bench/cuvs_bench/backends/opensearch.py`, `python/cuvs_bench/cuvs_bench/backends/__init__.py`, `python/cuvs_bench/cuvs_bench/orchestrator/__init__.py`	`OpenSearchConfigLoader` discovers opensearch* YAML configs, expands build/search parameter combinations, selects engine (lucene vs faiss), and normalizes index names. Backend and loader are registered into package exports and auto-registration hooks in backends and orchestrator modules.
Benchmark Configuration and Search Spaces `python/cuvs_bench/cuvs_bench/config/algos/opensearch_faiss_hnsw.yaml`, `python/cuvs_bench/cuvs_bench/backends/search_spaces.py`	New YAML config file `opensearch_faiss_hnsw` defines base and test hyperparameter groups with `m` and `ef_search` sweeps. Corresponding search space entry added to `ALGORITHM_SEARCH_SPACES`.
Orchestrator Lifecycle Management `python/cuvs_bench/cuvs_bench/orchestrator/orchestrator.py`	`_run_sweep` and `_run_trial` now wrap backend execution in try/finally, call `initialize()` before operations, guarantee `cleanup()` after, and tighten recall recomputation to only when backend recall is exactly `0.0` with non-empty neighbors.
Comprehensive Test Suite with Fixtures `python/cuvs_bench/cuvs_bench/tests/test_opensearch.py`	Test helpers for backend, dataset, and index configuration; fixtures for live OpenSearch reachability and remote build environment setup; unit tests for config loader expansion, dry-run, engine validation, and remote build timeout/failure handling; integration tests exercising live build/search and remote index build with assertions on timing, throughput, recall, and neighbor shapes.
Project Dependencies and Pytest Configuration `python/cuvs_bench/pyproject.toml`, `conda/`, `dependencies.yaml`, `conda/recipes/`	Declares `opensearch-py>=2.4.0` as an optional dependency and in conda/recipe outputs, updates dependency matrix to include an `opensearch` variant, and adds a pytest `opensearch` marker for tests requiring a live OpenSearch node.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

rapidsai/cuvs#2040: Both PRs modify the recall recomputation flow in orchestrator.py; this PR tightens the recall condition to only recompute when backend-reported recall is exactly 0.0 alongside successful search and non-empty neighbors.

Suggested labels

benchmarking

Suggested reviewers

bdice
cjnolet

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.94% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: adding an OpenSearch backend to cuvs-bench, which is well-supported by the changeset introducing new OpenSearch implementation files, configuration, and tests.
Description check	✅ Passed	The description is related to the changeset, referencing OpenSearch backend work, mentioning related Elastic backend PR, and indicating the work is ready for review despite being informal in tone.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/cuvs_bench/cuvs_bench/backends/search_spaces.py`:
- Around line 166-167: The tune range for the HNSW parameter "m" is too low;
update the search space definition that contains the "m" entry (the dict mapping
with keys "m" and "ef_construction" inside search_spaces.py) to increase the
"max" value from 64 to 96 so tune mode can cover the same m values as the
OpenSearch YAML sweep.

In `@python/cuvs_bench/cuvs_bench/orchestrator/orchestrator.py`:
- Around line 226-283: Wrap each config/trial in a try/except that catches any
exception from backend.initialize(), backend.build(), or backend.search() inside
_run_sweep/_run_trial and convert the exception into a failed result (e.g.,
create a Build/Search result with success=False and error_message=str(exc) or
append a simple failure record) so the loop can continue to the next
config/trial; ensure backend.cleanup() remains in the finally block so cleanup
always runs, and log the exception (including config.index_name and the
exception text) before continuing.

In `@python/cuvs_bench/cuvs_bench/tests/test_opensearch.py`:
- Around line 334-424: The session fixture remote_build_env currently returns
without cleanup and only catches ConnectionError; change it to a yield-style
fixture so you can teardown after tests: after yielding the dict, call the
OpenSearch APIs used earlier (the two session.put calls that set repo_name via
f"{opensearch_url}/_snapshot/{repo_name}" and cluster settings at
f"{opensearch_url}/_cluster/settings") to delete the snapshot repo and to
clear/reset the persistent knn.remote_index_build settings (unset the keys or
set enabled=false and clear repository/service.endpoint), and close the
requests.Session; also broaden the reachability probes that call
requests.get(builder_url, ...) and requests.get(s3_endpoint, ...) to catch
requests.exceptions.RequestException (or include Timeout) instead of only
ConnectionError so timeouts cause the fixture to skip rather than crash.
- Around line 427-451: Update the live_remote_build_backend fixture to
explicitly document how OpenSearch obtains S3 credentials for remote index
builds: add a short comment inside the live_remote_build_backend (near the
remote_build_env mention) stating that S3 credentials are not passed via the
OpenSearchBackend config or snapshot registration in this test and must already
be configured in the OpenSearch process/container (e.g., via the running
container's environment variables or OpenSearch keystore) so remote_index_build
can authenticate; reference the fixture name live_remote_build_backend, the
helper remote_build_env, and the OpenSearchBackend config in the comment so
readers know where the dependency lives.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ecda7e0c-b114-4384-9e6b-357b1c3e7f6d

📥 Commits

Reviewing files that changed from the base of the PR and between 49ce810 and 3898c19.

📒 Files selected for processing (8)

python/cuvs_bench/cuvs_bench/backends/__init__.py
python/cuvs_bench/cuvs_bench/backends/opensearch.py
python/cuvs_bench/cuvs_bench/backends/search_spaces.py
python/cuvs_bench/cuvs_bench/config/algos/opensearch_faiss_hnsw.yaml
python/cuvs_bench/cuvs_bench/orchestrator/__init__.py
python/cuvs_bench/cuvs_bench/orchestrator/orchestrator.py
python/cuvs_bench/cuvs_bench/tests/test_opensearch.py
python/cuvs_bench/pyproject.toml

coderabbitai · 2026-05-15T19:43:52Z

+            "m": {"type": "int", "min": 4, "max": 64},
+            "ef_construction": {"type": "int", "min": 32, "max": 1024},


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align tune range for m with the shipped OpenSearch sweep config.

Line 166 caps m at 64, but the new OpenSearch YAML sweep includes m=96. That makes tune mode unable to evaluate part of the supported config space for the same algorithm.

Suggested fix

"opensearch_faiss_hnsw": { "build": { - "m": {"type": "int", "min": 4, "max": 64}, + "m": {"type": "int", "min": 4, "max": 96}, "ef_construction": {"type": "int", "min": 32, "max": 1024}, },

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"m": {"type": "int", "min": 4, "max": 64},

"ef_construction": {"type": "int", "min": 32, "max": 1024},

"m": {"type": "int", "min": 4, "max": 96},

"ef_construction": {"type": "int", "min": 32, "max": 1024},

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuvs_bench/cuvs_bench/backends/search_spaces.py` around lines 166 - 167, The tune range for the HNSW parameter "m" is too low; update the search space definition that contains the "m" entry (the dict mapping with keys "m" and "ef_construction" inside search_spaces.py) to increase the "max" value from 64 to 96 so tune mode can cover the same m values as the OpenSearch YAML sweep.

coderabbitai · 2026-05-15T19:43:52Z

+            try:
+                backend.initialize()
+
+                if build:
+                    # Pass ALL indexes at once - ONE C++ command builds all
+                    build_result = backend.build(
+                        dataset=bench_dataset,
+                        indexes=config.indexes,
+                        force=force,
+                        dry_run=dry_run,
                    )
-                    continue
-
-            if search:
-                # Pass ALL indexes at once - ONE C++ command searches all
-                # Each index has its own search_params list
-                # Total benchmarks = sum(len(idx.search_params) for idx in indexes)
-                search_result = backend.search(
-                    dataset=bench_dataset,
-                    indexes=config.indexes,
-                    k=count,
-                    batch_size=batch_size,
-                    mode=search_mode,
-                    force=force,
-                    search_threads=search_threads,
-                    dry_run=dry_run,
-                )
+                    results.append(build_result)

-                # Compute recall for backends that return actual neighbors.
-                # The C++ backend computes recall in the subprocess and returns
-                # empty neighbors, so this is skipped for it.
-                # Note: The recall computation relies on empty neighbors to
-                # distinguish backends that already computed recall.
-                if search_result.success and search_result.neighbors.size > 0:
-                    gt = bench_dataset.groundtruth_neighbors
-                    if gt is not None:
-                        search_result.recall = compute_recall(
-                            search_result.neighbors, gt, count
+                    if not build_result.success:
+                        print(
+                            f"Build failed for {config.index_name}: {build_result.error_message}"
                        )
-
-                results.append(search_result)
-
-                if not search_result.success:
-                    print(
-                        f"Search failed for {config.index_name}: {search_result.error_message}"
+                        continue
+
+                if search:
+                    # Pass ALL indexes at once - ONE C++ command searches all
+                    # Each index has its own search_params list
+                    # Total benchmarks = sum(len(idx.search_params) for idx in indexes)
+                    search_result = backend.search(
+                        dataset=bench_dataset,
+                        indexes=config.indexes,
+                        k=count,
+                        batch_size=batch_size,
+                        mode=search_mode,
+                        force=force,
+                        search_threads=search_threads,
+                        dry_run=dry_run,
                    )

+                    # Compute recall for backends that return actual neighbors.
+                    # The C++ backend computes recall in the subprocess and returns
+                    # empty neighbors, so this is skipped for it.
+                    # Empty neighbors or nonzero recall indicate that the backend
+                    # already handled recall itself.
+                    if (
+                        search_result.success
+                        and search_result.neighbors.size > 0
+                        and search_result.recall == 0.0
+                    ):
+                        gt = bench_dataset.groundtruth_neighbors
+                        if gt is not None:
+                            search_result.recall = compute_recall(
+                                search_result.neighbors, gt, count
+                            )
+
+                    results.append(search_result)
+
+                    if not search_result.success:
+                        print(
+                            f"Search failed for {config.index_name}: {search_result.error_message}"
+                        )
+            finally:
+                backend.cleanup()


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Isolate backend failures so one config/trial does not abort the whole run.

Line 227 and Line 576 introduce initialize(), but _run_sweep/_run_trial still lack an except path. If initialize(), build(), or search() raises, execution exits the run instead of recording that config/trial as failed and continuing.

Please add an exception boundary per config/trial that converts exceptions into failed results (or at minimum logs and continues for sweep).

Also applies to: 575-618

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuvs_bench/cuvs_bench/orchestrator/orchestrator.py` around lines 226 - 283, Wrap each config/trial in a try/except that catches any exception from backend.initialize(), backend.build(), or backend.search() inside _run_sweep/_run_trial and convert the exception into a failed result (e.g., create a Build/Search result with success=False and error_message=str(exc) or append a simple failure record) so the loop can continue to the next config/trial; ensure backend.cleanup() remains in the finally block so cleanup always runs, and log the exception (including config.index_name and the exception text) before continuing.

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/cuvs_bench/cuvs_bench/tests/test_opensearch.py`:
- Around line 413-422: The snapshot repository registration omits the custom S3
endpoint; update the JSON passed to the session.put call (the PUT to
f"{opensearch_url}/_snapshot/{repo_name}") to include the S3 endpoint by adding
the "endpoint": s3_endpoint entry inside the "settings" dict (use the existing
s3_endpoint variable validated earlier) so OpenSearch uses the custom
S3_ENDPOINT instead of the default AWS endpoint.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 625c5ca1-017c-4427-89bd-c76d731b9c67

📥 Commits

Reviewing files that changed from the base of the PR and between 3898c19 and f7dfb33.

📒 Files selected for processing (3)

python/cuvs_bench/cuvs_bench/backends/search_spaces.py
python/cuvs_bench/cuvs_bench/config/algos/opensearch_faiss_hnsw.yaml
python/cuvs_bench/cuvs_bench/tests/test_opensearch.py

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/cuvs_bench/cuvs_bench/tests/test_opensearch.py`:
- Around line 89-90: The tests hard-code a cluster-visible index name via the
index_name variable used when instantiating OpenSearchBackend, causing
collisions across parallel/rerun fixtures; change the index_name creation in
test_opensearch.py (where index_name is assigned and where
OpenSearchBackend(...) is constructed, also at the duplicated location around
lines 471-472) to append a unique per-fixture suffix (for example using
uuid.uuid4().hex or pytest's request.node.nodeid/timestamp) so each fixture
instance gets a distinct index name, and pass that unique name into the
OpenSearchBackend constructor and any teardown code that deletes the index.
- Around line 396-427: Define a short admin timeout (e.g., admin_timeout = 10)
in the fixture and add timeout=admin_timeout to every OpenSearch admin HTTP call
shown: the session.get to "{opensearch_url}/_cluster/settings", the session.get
to "{opensearch_url}/_snapshot/{repo_name}", the session.put to
"{opensearch_url}/_snapshot/{repo_name}", plus the other admin
session.get/put/delete calls referenced in the finally/teardown block; ensure
you use the same admin_timeout variable so all admin requests (those using
session.get/put/delete in this fixture) are bounded consistently.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a2150ba9-3aee-4bb8-ba72-a21a594cb9d7

📥 Commits

Reviewing files that changed from the base of the PR and between f7dfb33 and 9e9af94.

📒 Files selected for processing (1)

python/cuvs_bench/cuvs_bench/tests/test_opensearch.py

coderabbitai · 2026-05-18T21:05:28Z

+    index_name = "cuvs_test_index"
+    backend = OpenSearchBackend(


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make the live index names unique per fixture instance.

Line 89 and Line 471 hard-code cluster-visible index names. Parallel integration runs or reruns against the same OpenSearch node can overwrite or delete each other's indices during teardown, which makes these tests flaky.

🛠️ Proposed fix

- index_name = "cuvs_test_index" + index_name = f"cuvs_test_index_{os.urandom(4).hex()}" @@ - index_name = "cuvs_test_remote_index" + index_name = f"cuvs_test_remote_index_{os.urandom(4).hex()}"

Also applies to: 471-472

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuvs_bench/cuvs_bench/tests/test_opensearch.py` around lines 89 - 90, The tests hard-code a cluster-visible index name via the index_name variable used when instantiating OpenSearchBackend, causing collisions across parallel/rerun fixtures; change the index_name creation in test_opensearch.py (where index_name is assigned and where OpenSearchBackend(...) is constructed, also at the duplicated location around lines 471-472) to append a unique per-fixture suffix (for example using uuid.uuid4().hex or pytest's request.node.nodeid/timestamp) so each fixture instance gets a distinct index name, and pass that unique name into the OpenSearchBackend constructor and any teardown code that deletes the index.

coderabbitai · 2026-05-18T21:05:28Z

+    cluster_settings_response = session.get(
+        f"{opensearch_url}/_cluster/settings"
+    )
+    cluster_settings_response.raise_for_status()
+    cluster_settings = cluster_settings_response.json()
+    initial_remote_build_settings = {
+        key: cluster_settings.get("persistent", {}).get(key)
+        for key in remote_build_setting_keys
+    }
+
+    repo_response = session.get(f"{opensearch_url}/_snapshot/{repo_name}")
+    if repo_response.status_code == 404:
+        initial_snapshot_repo = None
+    else:
+        repo_response.raise_for_status()
+        initial_snapshot_repo = repo_response.json().get(repo_name)
+
+    snapshot_settings = {
+        "bucket": s3_bucket,
+        "base_path": s3_prefix.rstrip("/"),
+        "region": s3_region,
+    }
+    if s3_endpoint is not None:
+        snapshot_settings["endpoint"] = s3_endpoint
+
+    session.put(
+        f"{opensearch_url}/_snapshot/{repo_name}",
+        json={
+            "type": "s3",
+            "settings": snapshot_settings,
+        },
+    ).raise_for_status()


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "test_opensearch.py" | head -20

Repository: rapidsai/cuvs

Length of output: 113

🏁 Script executed:

wc -l ./python/cuvs_bench/cuvs_bench/tests/test_opensearch.py

Repository: rapidsai/cuvs

Length of output: 117

🏁 Script executed:

sed -n '396,461p' ./python/cuvs_bench/cuvs_bench/tests/test_opensearch.py

Repository: rapidsai/cuvs

Length of output: 2164

🏁 Script executed:

sed -n '350,400p' ./python/cuvs_bench/cuvs_bench/tests/test_opensearch.py

Repository: rapidsai/cuvs

Length of output: 1742

🏁 Script executed:

sed -n '461,480p' ./python/cuvs_bench/cuvs_bench/tests/test_opensearch.py

Repository: rapidsai/cuvs

Length of output: 743

Add explicit timeouts to all OpenSearch admin API calls.

All session.get(), session.put(), and session.delete() calls in this fixture lack timeout parameters, making them unbounded. If OpenSearch becomes unresponsive during repo registration or teardown, the entire integration session will hang indefinitely instead of failing fast. Note that health probe calls (lines 378–384) already use timeout=2, establishing a precedent for bounded requests.

Add a timeout constant (e.g., admin_timeout = 10) and apply it to all admin calls: the initial settings/repo queries (lines 397, 406), repo registration (lines 421–428), feature enablement (lines 430–438), and all cleanup operations in the finally block (lines 445–450, 452, 453–457).

🛠️ Proposed fix

session = requests.Session() session.headers.update({"Content-Type": "application/json"}) + admin_timeout = 10 repo_name = "vector-repo" @@ - cluster_settings_response = session.get( - f"{opensearch_url}/_cluster/settings" - ) + cluster_settings_response = session.get( + f"{opensearch_url}/_cluster/settings", timeout=admin_timeout + ) @@ - repo_response = session.get(f"{opensearch_url}/_snapshot/{repo_name}") + repo_response = session.get( + f"{opensearch_url}/_snapshot/{repo_name}", timeout=admin_timeout + ) @@ session.put( f"{opensearch_url}/_snapshot/{repo_name}", json={ "type": "s3", "settings": snapshot_settings, }, - ).raise_for_status() + timeout=admin_timeout, + ).raise_for_status() @@ session.put( f"{opensearch_url}/_cluster/settings", json={ "persistent": { "knn.remote_index_build.enabled": True, "knn.remote_index_build.repository": repo_name, "knn.remote_index_build.service.endpoint": builder_url, } }, - ).raise_for_status() + timeout=admin_timeout, + ).raise_for_status() @@ session.put( f"{opensearch_url}/_cluster/settings", json={ "persistent": initial_remote_build_settings, }, + timeout=admin_timeout, ) if initial_snapshot_repo is None: - session.delete(f"{opensearch_url}/_snapshot/{repo_name}") + session.delete( + f"{opensearch_url}/_snapshot/{repo_name}", + timeout=admin_timeout, + ) else: session.put( f"{opensearch_url}/_snapshot/{repo_name}", json=initial_snapshot_repo, + timeout=admin_timeout, )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuvs_bench/cuvs_bench/tests/test_opensearch.py` around lines 396 - 427, Define a short admin timeout (e.g., admin_timeout = 10) in the fixture and add timeout=admin_timeout to every OpenSearch admin HTTP call shown: the session.get to "{opensearch_url}/_cluster/settings", the session.get to "{opensearch_url}/_snapshot/{repo_name}", the session.put to "{opensearch_url}/_snapshot/{repo_name}", plus the other admin session.get/put/delete calls referenced in the finally/teardown block; ensure you use the same admin_timeout variable so all admin requests (those using session.get/put/delete in this fixture) are bounded consistently.

jnke2016 · 2026-05-18T21:15:39Z

+                )
+
+        # Dataset handles lazy vector loading from base_file when needed.
+        base_vectors = dataset.training_vectors


Just as intended: The backend just uses dataset.training_vectors without worrying about file formats, loading logic, or how the data got there. The framework handles all of that transparently.

jnke2016 · 2026-05-18T21:24:46Z

+                distances=np.zeros((0, k), dtype=np.float32),
+                search_time_ms=0.0,
+                queries_per_second=0.0,
+                recall=0.0,


Good, the backend returns recall=0.0 as a placeholder. The orchestrator then computes the actual recall from the returned neighbors and ground truth, so all backends are evaluated the same way without each one implementing its own recall logic

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

…ch-opensearch Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

jrbourbeau

@cjnolet @jnke2016 can you give this a final review when you get a moment?

jameslamb

Left some small suggestions for your consideration.

jameslamb · 2026-05-27T13:48:15Z


+[tool.pytest.ini_options]
+markers = [
+    "integration: tests that require a live OpenSearch node (run with '-m integration')",


I'd also say "integration" is a bit overly-general of a name if you intend to add more backends in the future. -m opensearch (or something similarly specific) would be a bit more future-proof.

jameslamb · 2026-05-27T13:50:16Z

 ]

+[project.optional-dependencies]
+opensearch = ["opensearch-py>=2.4.0"]


We generally prefer to have all dependencies managed via dependencies.yaml (the config file for https://github.com/rapidsai/dependency-file-generator), to improve consistency and for flexibility in generating dependency lists in different formats.

Could you please move this there?

Here's an example to follow:

https://github.com/rapidsai/cuvs/blob/4addf4e6c1da42743d5701e5ad10764333317732/dependencies.yaml#L185-L194

https://github.com/rapidsai/cuvs/blob/4addf4e6c1da42743d5701e5ad10764333317732/dependencies.yaml#L520-L535

If you get stuck and would prefer I just directly push to this branch, let me know.

Also is there a conda equivalent of this and if so do we want it as a dependency of the cuvs-bench package?

Thanks for the feedback @jameslamb. I just pushed a commit with the changes that I think you're asking for. But let me know if I missed something, or you're asking for something different -- it's my first time mucking around with the dependencies.yaml config file

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

…ch-opensearch Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

jameslamb

Changes look great, thanks!

cjnolet · 2026-05-27T21:40:30Z

/merge

Add OpenSearch backend to cuvs-bench

b30b9e1

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

jrbourbeau requested review from a team as code owners April 10, 2026 18:30

jrbourbeau requested a review from bdice April 10, 2026 18:30

github-project-automation Bot added this to Unstructured Data Processing Apr 10, 2026

jrbourbeau commented Apr 10, 2026

View reviewed changes

jrbourbeau added 2 commits April 10, 2026 14:13

Lint

a58363f

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

Merge branch 'main' of https://github.com/rapidsai/cuvs into cuvs-ben…

734a83c

…ch-opensearch Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

jrbourbeau mentioned this pull request Apr 10, 2026

[DNM] Add OpenSearch deployment #2013

Draft

jrbourbeau added 2 commits April 13, 2026 11:43

Updates

ed7c1d0

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

Merge branch 'main' of https://github.com/rapidsai/cuvs into cuvs-ben…

7131d39

…ch-opensearch Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

jrbourbeau changed the title ~~[WIP] Add OpenSearch backend to cuvs-bench~~ [REVIEW] Add OpenSearch backend to cuvs-bench Apr 13, 2026

jnke2016 self-requested a review April 20, 2026 18:43

jnke2016 approved these changes Apr 20, 2026

View reviewed changes

cjnolet reviewed Apr 20, 2026

View reviewed changes

cjnolet assigned jrbourbeau Apr 20, 2026

cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Apr 20, 2026

cjnolet moved this to In Progress in Unstructured Data Processing Apr 20, 2026

jrbourbeau added 3 commits April 20, 2026 14:40

Updates

964a0df

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

Merge branch 'main' of https://github.com/rapidsai/cuvs into cuvs-ben…

288d308

…ch-opensearch

Lint

887b2a3

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

jnke2016 reviewed Apr 27, 2026

View reviewed changes

Comment thread python/cuvs_bench/cuvs_bench/backends/opensearch.py Outdated

jnke2016 reviewed Apr 27, 2026

View reviewed changes

Update OpenSearch benchmark polling and tuning

ed490cb

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

jrbourbeau added 3 commits May 14, 2026 10:41

Merge remote-tracking branch 'upstream/main' into cuvs-bench-opensearch

d0059b9

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com> # Conflicts: # python/cuvs_bench/cuvs_bench/orchestrator/orchestrator.py

Merge remote-tracking branch 'upstream/main' into cuvs-bench-opensearch

a0ebded

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

Use new centralized utilities

3898c19

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

jrbourbeau commented May 15, 2026

View reviewed changes

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

Merge branch 'main' into cuvs-bench-opensearch2

a92c2cf

cjnolet mentioned this pull request May 18, 2026

[FEA] Set up infra for scheduling e2e vector database benchmarks / integration tests #2105

Open

Updates

f7dfb33

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread python/cuvs_bench/cuvs_bench/tests/test_opensearch.py

S3 endpoint

9e9af94

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

jnke2016 reviewed May 18, 2026

View reviewed changes

jrbourbeau added 2 commits May 20, 2026 16:30

Updates

219058d

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

Merge branch 'main' of https://github.com/rapidsai/cuvs into cuvs-ben…

f653069

…ch-opensearch Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

jrbourbeau commented May 21, 2026

View reviewed changes

cjnolet approved these changes May 27, 2026

View reviewed changes

jameslamb requested changes May 27, 2026

View reviewed changes

jrbourbeau added 2 commits May 27, 2026 12:59

Packaging updates

4dea0d5

Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

Merge branch 'main' of https://github.com/rapidsai/cuvs into cuvs-ben…

0974219

…ch-opensearch Signed-off-by: James Bourbeau <jbourbeau@nvidia.com>

jameslamb approved these changes May 27, 2026

View reviewed changes

rapids-bot Bot merged commit caf24ce into NVIDIA:main May 27, 2026
86 checks passed

github-project-automation Bot moved this from In Progress to Done in Unstructured Data Processing May 27, 2026

jrbourbeau deleted the cuvs-bench-opensearch2 branch May 28, 2026 14:26

coderabbitai Bot mentioned this pull request Jun 12, 2026

Add cuvs-bench-elastic: HTTP backend for Elasticsearch GPU vector search #1907

Open

josephine-wolf-oberholtzer added this to Unstructured Data Processing Jul 1, 2026

josephine-wolf-oberholtzer moved this to Done in Unstructured Data Processing Jul 1, 2026

		return data.reshape(n_rows, n_cols)


		def _load_yaml(path: str) -> Any:



		def _gather_algo_configs(
		config_path: str, algorithm_configuration: Optional[str]


		benchmark_configs: List[BenchmarkConfig] = []

		for algo_file in algo_files:

		"m": {"type": "int", "min": 4, "max": 64},
		"ef_construction": {"type": "int", "min": 32, "max": 1024},

Uh oh!

Conversation

jrbourbeau commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnke2016 left a comment

Choose a reason for hiding this comment

Uh oh!

cjnolet Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnke2016 Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cjnolet Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnke2016 Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cjnolet Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cjnolet commented Apr 20, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnke2016 Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnke2016 May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

jrbourbeau commented Apr 10, 2026 •

edited

Loading

cjnolet Apr 20, 2026 •

edited

Loading

jnke2016 Apr 20, 2026 •

edited

Loading

cjnolet Apr 20, 2026 •

edited

Loading

jnke2016 Apr 25, 2026 •

edited

Loading

cjnolet Apr 20, 2026 •

edited

Loading

jnke2016 Apr 27, 2026 •

edited

Loading

jnke2016 May 4, 2026 •

edited

Loading

coderabbitai Bot commented May 15, 2026 •

edited

Loading