Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
336432d
Add cuvs-bench-elastic: HTTP backend for Elasticsearch GPU vector search
afourniernv Mar 10, 2026
80c104f
Fix bulk indexing format for Elasticsearch
afourniernv Mar 20, 2026
7b25521
Add run_build, run_search, run_benchmark API for elastic backend
afourniernv Mar 20, 2026
c224a8c
Update cuvs-bench-elastic README and config loader
afourniernv Mar 20, 2026
e499469
Add cuvs-bench-elastic: Elasticsearch GPU backend plugin
afourniernv Apr 12, 2026
087289d
Revert unrelated change to get_dataset/__main__.py
afourniernv Apr 12, 2026
e8f8a64
Fold Elasticsearch plugin back into cuvs_bench; wire via entry points
afourniernv Apr 13, 2026
b787396
Add run_build/run_search/run_benchmark convenience API to elasticsear…
afourniernv Apr 13, 2026
94a9199
Merge branch 'main' into pr1907-reconcile
afourniernv May 22, 2026
4e35484
Fix elasticsearch backend after cuvs-bench API changes
afourniernv May 20, 2026
e00c288
Merge remote-tracking branch 'origin/main' into pr1907-reconcile
afourniernv May 22, 2026
1264a98
Merge remote-tracking branch 'origin/main' into pr1907-reconcile
afourniernv Jun 5, 2026
cd93845
Improve elasticsearch backend validation
afourniernv Jun 5, 2026
4b0b1f0
Merge remote-tracking branch 'origin/main' into pr1907-reconcile
afourniernv Jun 10, 2026
ad40caa
Tighten optional backend handling
afourniernv Jun 11, 2026
1d2a1a9
Merge remote-tracking branch 'origin/main' into pr1907-reconcile
afourniernv Jun 11, 2026
680c2ab
Merge remote-tracking branch 'origin/main' into pr1907-reconcile
afourniernv Jun 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,14 @@ files:
table: project
includes:
- bench_python
py_elastic_cuvs_bench:
output: pyproject
pyproject_dir: python/cuvs_bench
extras:
table: project.optional-dependencies
key: elastic
includes:
- bench_elastic
channels:
- rapidsai-nightly
- rapidsai
Expand Down Expand Up @@ -500,6 +508,11 @@ dependencies:
- output_types: [requirements, pyproject]
packages:
- matplotlib>=3.9
bench_elastic:
common:
- output_types: [conda, pyproject, requirements]
packages:
- cuvs-bench-elastic>=26.4.0
Comment on lines +596 to +600

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== pyproject elastic extra ==="
sed -n '41,56p' python/cuvs_bench/pyproject.toml

echo
echo "=== dependencies.yaml elastic source-of-truth ==="
rg -n "py_elastic_cuvs_bench|bench_elastic|cuvs-bench-elastic|elasticsearch>=8.0" dependencies.yaml -A4 -B3

Repository: rapidsai/cuvs

Length of output: 1500


Fix elastic extra source-of-truth drift between dependencies.yaml and generated pyproject.toml

dependencies.yaml maps project.optional-dependencies.elastic (via py_elastic_cuvs_benchbench_elastic) to cuvs-bench-elastic>=26.4.0, but python/cuvs_bench/pyproject.toml currently declares elastic = ["elasticsearch>=8.0"]—regenerating from dependencies.yaml will overwrite the committed pyproject extra and create install-hint/metadata drift.

Suggested fix
   bench_elastic:
     common:
       - output_types: [conda, pyproject, requirements]
         packages:
-          - cuvs-bench-elastic>=26.4.0
+          - elasticsearch>=8.0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dependencies.yaml` around lines 596 - 600, The pyproject extra for "elastic"
is drifting because dependencies.yaml defines bench_elastic ->
cuvs-bench-elastic>=26.4.0 while python/cuvs_bench/pyproject.toml currently
lists elastic = ["elasticsearch>=8.0"]; update the source-of-truth to keep them
in sync by changing dependencies.yaml's bench_elastic entry to include the same
package(s) as the pyproject extra (or update python/cuvs_bench/pyproject.toml to
match dependencies.yaml) so that project.optional-dependencies.elastic (via
py_elastic_cuvs_bench / bench_elastic) consistently maps to the intended package
list and version constraint.

Source: Coding guidelines

depends_on_cuda_python:
specific:
- output_types: [conda, requirements, pyproject]
Expand Down
69 changes: 66 additions & 3 deletions python/cuvs_bench/cuvs_bench/backends/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,15 @@
from typing import Dict, Type, Optional
from pathlib import Path
import importlib
import importlib.metadata
import yaml

from .base import BenchmarkBackend

# Entry point group names for plugin discovery
_BACKENDS_GROUP = "cuvs_bench.backends"
_CONFIG_LOADERS_GROUP = "cuvs_bench.config_loaders"


class BackendRegistry:
"""
Expand Down Expand Up @@ -375,10 +380,43 @@ def get_backend(name: str, config: Dict) -> BenchmarkBackend:
return registry.get_backend(name, config)


def _try_load_plugin(name: str) -> None:
"""
Try to load backend and config loader from entry points for the given name.

Plugins register themselves when their entry point is loaded.
Raises ImportError with install instructions if the plugin requires
an optional dependency that is not installed.
"""
for group in (_BACKENDS_GROUP, _CONFIG_LOADERS_GROUP):
try:
eps = importlib.metadata.entry_points(group=group)
except TypeError:
eps = importlib.metadata.entry_points().get(group, [])
if hasattr(eps, "select"): # Python 3.10+
eps = list(eps.select(name=name))
else:
eps = [e for e in eps if e.name == name]
for ep in eps:
try:
ep.load()()
except ImportError as e:
if "elasticsearch" in str(e).lower() or "elasticsearch" in str(e):
raise ImportError(
Comment thread
afourniernv marked this conversation as resolved.
Outdated
f"Elasticsearch backend requires the 'elastic' extra. "
f"Install with: pip install cuvs-bench[elastic]"
) from e
raise
return # Plugin loaded successfully
Comment on lines +415 to +440

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

HIGH: _try_load_plugin() can short-circuit config-loader discovery.

At Line 440, the function returns after the first successful load in _BACKENDS_GROUP, so get_config_loader() (Line 533) may never check _CONFIG_LOADERS_GROUP for same-named plugins that split registrations.

Suggested fix
-def _try_load_plugin(name: str) -> None:
+def _try_load_plugin(
+    name: str,
+    groups: tuple[str, ...] = (_BACKENDS_GROUP, _CONFIG_LOADERS_GROUP),
+) -> None:
@@
-    for group in (_BACKENDS_GROUP, _CONFIG_LOADERS_GROUP):
+    for group in groups:
@@
-            return  # Plugin loaded successfully
+            return  # Plugin loaded successfully
@@
 def get_config_loader(name: str) -> Type:
@@
     if name not in _CONFIG_LOADER_REGISTRY:
-        _try_load_plugin(name)
+        _try_load_plugin(
+            name,
+            groups=(_CONFIG_LOADERS_GROUP, _BACKENDS_GROUP),
+        )

As per coding guidelines, integration errors are HIGH-priority review targets.

Also applies to: 532-534

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuvs_bench/cuvs_bench/backends/registry.py` around lines 415 - 440,
The function _try_load_plugin currently returns after loading the first matching
entry point, which causes it to short-circuit and skip checking the other
registry group (_BACKENDS_GROUP vs _CONFIG_LOADERS_GROUP) for the same plugin
name; change the control flow so that after a successful ep.load()() you do not
return from _try_load_plugin but instead continue scanning remaining
groups/entry-points (i.e., remove/relocate the unconditional return and only
exit once all groups/entry-points have been iterated or explicitly determine
both backend and config-loader are loaded), ensuring get_config_loader() can
find same-named plugins that split registrations across groups.

Source: Coding guidelines



def get_backend_class(name: str) -> Type[BenchmarkBackend]:
"""
Get the backend class (not instance) from the global registry.

If the backend is not registered, attempts to load it from entry points
(e.g., optional plugins like elastic).

Parameters
----------
name : str
Expand All @@ -390,10 +428,15 @@ def get_backend_class(name: str) -> Type[BenchmarkBackend]:
Backend class
"""
registry = get_registry()
if name not in registry._backends:
_try_load_plugin(name)
if name not in registry._backends:
available = ", ".join(registry._backends.keys())
hint = ""
if name == "elastic":
hint = " Install with: pip install cuvs-bench[elastic]"
raise ValueError(
f"Backend '{name}' not found. Available backends: {available or '(none)'}"
f"Backend '{name}' not found. Available backends: {available or '(none)'}.{hint}"
)
return registry._backends[name]

Expand Down Expand Up @@ -440,6 +483,9 @@ def get_config_loader(name: str) -> Type:
"""
Get a registered config loader class by name.

If the config loader is not registered, attempts to load it from entry points
(e.g., optional plugins like elastic).

Parameters
----------
name : str
Expand All @@ -455,15 +501,32 @@ def get_config_loader(name: str) -> Type:
ValueError
If config loader is not registered
"""
# _CONFIG_LOADER_REGISTRY is a dictionary that maps backend names to config loader classes
if name not in _CONFIG_LOADER_REGISTRY:
_try_load_plugin(name)
if name not in _CONFIG_LOADER_REGISTRY:
available = ", ".join(_CONFIG_LOADER_REGISTRY.keys()) or "none"
hint = ""
if name == "elastic":
hint = " Install with: pip install cuvs-bench[elastic]"
raise ValueError(
f"Unknown config loader for backend: '{name}'. Available: {available}"
f"Unknown config loader for backend: '{name}'. Available: {available}.{hint}"
)
return _CONFIG_LOADER_REGISTRY[name]


def list_config_loaders() -> Dict[str, Type]:
"""Return all registered config loaders."""
return dict(_CONFIG_LOADER_REGISTRY)


def unregister_config_loader(name: str) -> None:
"""
Unregister a config loader by name (primarily for testing).

Parameters
----------
name : str
Backend name to unregister
"""
if name in _CONFIG_LOADER_REGISTRY:
del _CONFIG_LOADER_REGISTRY[name]
40 changes: 40 additions & 0 deletions python/cuvs_bench/cuvs_bench/backends/search_spaces.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,46 @@
"ef": {"type": "int", "min": 10, "max": 1000},
},
},
# =========================================================================
# Elasticsearch GPU HNSW (hnsw, int8_hnsw, int4_hnsw, bbq_hnsw)
# Per ES-GPU-API-REFERENCE.md: index_options (m, ef_construction), knn (num_candidates)
# =========================================================================
"elastic_hnsw": {
"build": {
"m": {"type": "int", "min": 8, "max": 64},
"ef_construction": {"type": "int", "min": 50, "max": 500},
},
"search": {
"num_candidates": {"type": "int", "min": 50, "max": 500},
},
},
"elastic_int8_hnsw": {
"build": {
"m": {"type": "int", "min": 8, "max": 64},
"ef_construction": {"type": "int", "min": 50, "max": 500},
},
"search": {
"num_candidates": {"type": "int", "min": 50, "max": 500},
},
},
"elastic_int4_hnsw": {
"build": {
"m": {"type": "int", "min": 8, "max": 64},
"ef_construction": {"type": "int", "min": 50, "max": 500},
},
"search": {
"num_candidates": {"type": "int", "min": 50, "max": 500},
},
},
"elastic_bbq_hnsw": {
"build": {
"m": {"type": "int", "min": 8, "max": 64},
"ef_construction": {"type": "int", "min": 50, "max": 500},
},
"search": {
"num_candidates": {"type": "int", "min": 50, "max": 500},
},
},
}


Expand Down
23 changes: 23 additions & 0 deletions python/cuvs_bench/cuvs_bench/tests/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#
# SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION.
# SPDX-License-Identifier: Apache-2.0
#
"""Pytest configuration for cuvs_bench tests."""


def pytest_configure(config):
"""Register elastic plugin when elasticsearch is available.

Ensures elastic tests run when elasticsearch is installed, even if
cuvs-bench-elastic was not installed via pip (e.g. using PYTHONPATH).
"""
try:
import elasticsearch # noqa: F401
except ImportError:
return

try:
from cuvs_bench_elastic import register
register()
except ImportError:
pass
Comment on lines +19 to +23

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== conftest registration block ==="
sed -n '8,24p' python/cuvs_bench/cuvs_bench/tests/conftest.py

echo
echo "=== in-tree elastic register symbol ==="
rg -n "def register\\(" python/cuvs_bench/cuvs_bench/backends/elasticsearch.py -C2 || true

echo
echo "=== legacy cuvs_bench_elastic module paths in repo ==="
fd -HI "cuvs_bench_elastic" python || true

Repository: rapidsai/cuvs

Length of output: 809


conftest elastic plugin registration ignores the in-tree backend and silently drops failures

python/cuvs_bench/cuvs_bench/tests/conftest.py’s pytest_configure only tries from cuvs_bench_elastic import register and swallows ImportError; the in-tree entry point python/cuvs_bench/cuvs_bench/backends/elasticsearch.py defines register(), and the legacy cuvs_bench_elastic module doesn’t exist in this repo—so source-tree runs may not register the elastic backend as the docstring claims.

Suggested fix
-    try:
-        from cuvs_bench_elastic import register
-        register()
-    except ImportError:
-        pass
+    try:
+        from cuvs_bench.backends.elasticsearch import register
+    except ImportError:
+        try:
+            from cuvs_bench_elastic import register  # legacy fallback
+        except ImportError:
+            return
+    register()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuvs_bench/cuvs_bench/tests/conftest.py` around lines 19 - 23,
pytest_configure currently only attempts "from cuvs_bench_elastic import
register" and swallows ImportError, which prevents the in-tree backend's
register() from being used; update pytest_configure in conftest.py to try the
external import first, and if that raises ImportError, import the in-tree
backend's register (e.g. from cuvs_bench.backends.elasticsearch import register)
as a fallback, and ensure any remaining ImportError is not silently suppressed
(log or re-raise) so registration failures are visible; reference the
pytest_configure function and the register() symbol when making the change.

Source: Coding guidelines

34 changes: 34 additions & 0 deletions python/cuvs_bench/cuvs_bench/tests/integration/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Integration Tests

Integration tests run against real services (e.g. Elasticsearch in Docker).

**Note:** The Elasticsearch integration test is currently disabled. It targets the
Elasticsearch GPU backend (cuVS-accelerated), which requires an ES GPU image,
cuVS libs from this repo, and a GPU-enabled runner. See `test_elastic_integration.py`
module docstring for details. Can be re-enabled when CI has these dependencies.

## Requirements

- **Docker** running locally
- Optional dependencies:

```bash
pip install cuvs-bench[elastic,integration]
```

Or install separately:

```bash
pip install cuvs-bench[elastic]
pip install testcontainers[elasticsearch]
```

## Running

```bash
# From repo root
PYTHONPATH="python/cuvs_bench:python/cuvs_bench_elastic:$PYTHONPATH" \
pytest python/cuvs_bench/cuvs_bench/tests/integration/ -v
```

Integration tests are skipped automatically if `testcontainers` or `elasticsearch` is not installed.
9 changes: 9 additions & 0 deletions python/cuvs_bench/cuvs_bench/tests/integration/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#
# SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION.
# SPDX-License-Identifier: Apache-2.0
#
"""Integration tests for cuvs-bench backends.

These tests require Docker and optional dependencies:
pip install cuvs-bench[elastic,integration]
"""
73 changes: 73 additions & 0 deletions python/cuvs_bench/cuvs_bench/tests/integration/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
#
# SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION.
# SPDX-License-Identifier: Apache-2.0
#
"""Pytest fixtures for integration tests.

Requires: pip install cuvs-bench[elastic,integration]
Requires: Docker running locally
"""

import pytest


def _testcontainers_available():
try:
from testcontainers.elasticsearch import ElasticSearchContainer
return True
except ImportError:
return False


def _elasticsearch_installed():
try:
import elasticsearch # noqa: F401
return True
except ImportError:
return False


@pytest.fixture(scope="module")
def elasticsearch_container():
Comment thread
afourniernv marked this conversation as resolved.
Outdated
"""Start an Elasticsearch container for the duration of the test module.

Yields a dict with host, port, and get_url() for connecting.
Skips the test module if testcontainers or elasticsearch is not installed.
"""
if not _testcontainers_available():
pytest.skip(
"Requires testcontainers[elasticsearch]. "
"Install with: pip install cuvs-bench[integration]"
)
if not _elasticsearch_installed():
pytest.skip(
"Requires elasticsearch. Install with: pip install cuvs-bench[elastic]"
)

from testcontainers.elasticsearch import ElasticSearchContainer

# Use standard ES OSS image (no GPU in OSS; use use_gpu=False in tests)
with ElasticSearchContainer(
image="elasticsearch:8.15.0",
mem_limit="2g",
) as container:
url = container.get_url()
# Parse host:port from URL (e.g. http://localhost:32768)
if url.startswith("http://"):
rest = url[7:]
elif url.startswith("https://"):
rest = url[8:]
else:
rest = url
if "/" in rest:
host_port = rest.split("/")[0]
else:
host_port = rest
if ":" in host_port:
host, port_str = host_port.rsplit(":", 1)
port = int(port_str)
else:
host = host_port
port = 9200

yield {"host": host, "port": port, "url": url}
Loading
Loading