Skip to content

[DRAFT} Converting rst to md#2056

Closed
cjnolet wants to merge 2 commits into
NVIDIA:mainfrom
cjnolet:docs-2606_rst_to_md
Closed

[DRAFT} Converting rst to md#2056
cjnolet wants to merge 2 commits into
NVIDIA:mainfrom
cjnolet:docs-2606_rst_to_md

Conversation

@cjnolet

@cjnolet cjnolet commented May 5, 2026

Copy link
Copy Markdown
Contributor

Initial attempt at converting rst files in codebase to markdown. This is important for us to migrate to docs.nvidia.com, which requires migration to Fern. Fern ultimately requires markdown.

@cjnolet cjnolet self-assigned this May 5, 2026
@cjnolet cjnolet added the doc Improvements or additions to documentation label May 5, 2026
@cjnolet cjnolet requested a review from a team as a code owner May 5, 2026 16:42
@cjnolet cjnolet added the non-breaking Introduces a non-breaking change label May 5, 2026
@cjnolet cjnolet requested review from a team as code owners May 5, 2026 16:42
@cjnolet cjnolet requested a review from KyleFromNVIDIA May 5, 2026 16:42
@coderabbitai

coderabbitai Bot commented May 5, 2026

Copy link
Copy Markdown
📝 Walkthrough

Summary by CodeRabbit

  • Documentation
    • Migrated most docs to Markdown/MyST.
    • Added many user guides: Getting Started, Build, Advanced Topics (JIT), API Basics, Interoperability, and Working with ANN Indexes (C/C++/Python/Rust).
    • Expanded cuVS Bench docs: index, parameter tuning, pluggable backends, and benchmarks.
    • Reworked API reference and algorithm guides across languages (C/C++/Python/Rust), plus neighbor/index algorithm pages and filtering guidance.

Walkthrough

Migrates Sphinx docs from reStructuredText/recommonmark to MyST/Markdown, adding myst-parser, converting ~100+ RST pages to .md, updating conf.py and conda/dependency manifests, creating new Markdown API/guide pages, removing the old .rst files, and adjusting a release script to update the new cuvs_bench index.md Docker tag.

Changes

Docs migration + tooling + CI

Layer / File(s) Summary
Dependency / Build config
conda/environments/*cuda-*_*.yaml, dependencies.yaml, docs/source/conf.py
Add myst-parser; remove recommonmark and sphinx-markdown-tables; set Sphinx to treat .md files as Markdown and enable MyST features.
Documentation content (new Markdown)
docs/source/*.md, docs/source/*/*.md, docs/source/c_api/*.md, docs/source/cpp_api/*.md, docs/source/python_api/*.md, docs/source/neighbors/*.md, docs/source/cuvs_bench/*.md, docs/source/working_with_ann_indexes_*.md, docs/source/rust_api/index.md
Add/replace many documentation pages in MyST/Markdown form (API landing pages, language-specific API pages, guides, algorithm deep-dives, cuvs_bench guides, JIT/granular topics) with updated headings, fenced code blocks, MyST directives ({toctree}, {doxygengroup}, {autoclass}, {autofunction}, {image}, {note}, {warning}), and inline math via dollar notation.
TOC / wiring updates
docs/source/api_docs.md, docs/source/index.md, docs/source/*/index.md, docs/source/*/*.md
Rewire toctrees and cross-links to reference .md pages; reorganize API/index landing pages to point at new Markdown files.
CI / release script
ci/release/update-version.sh
Update sed target to replace Docker image tag in docs/source/cuvs_bench/index.md (was .rst).
Cleanup: remove old RST
docs/source/*.rst, docs/source/*/*.rst (removed files)
Delete legacy reStructuredText files corresponding to pages migrated to Markdown to avoid duplication.
Changelog / minor docs edits
CHANGELOG.md, docs/source/developer_guide.md
Small formatting fixes in changelog and developer guide to match Markdown style.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • divyegala
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/source/cuvs_bench/build.md (1)

57-57: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix typo in datatype name.

"unit8_t" should be "uint8_t".

📝 Proposed fix
-By default, the `*_ANN_BENCH` executables program infer the dataset's datatype from the filename's extension. For example, an extension of `fbin` uses a `float` datatype, `f16bin` uses a `float16` datatype, extension of `i8bin` uses `int8_t` datatype, and `u8bin` uses `uint8_t` type. Currently, only `float`, `float16`, `int8_t`, and `unit8_t` are supported.
+By default, the `*_ANN_BENCH` executables program infer the dataset's datatype from the filename's extension. For example, an extension of `fbin` uses a `float` datatype, `f16bin` uses a `float16` datatype, extension of `i8bin` uses `int8_t` datatype, and `u8bin` uses `uint8_t` type. Currently, only `float`, `float16`, `int8_t`, and `uint8_t` are supported.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cuvs_bench/build.md` at line 57, Typo in the documentation:
replace the incorrect datatype token "unit8_t" with the correct "uint8_t" in the
sentence listing supported datatypes (the sentence that currently reads "float,
float16, int8_t, and unit8_t"); update that occurrence in
docs/source/cuvs_bench/build.md so the supported types list reads "float,
float16, int8_t, and uint8_t".
♻️ Duplicate comments (2)
docs/source/c_api.md (1)

3-3: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Duplicate label will cause Sphinx build errors.

The label (api)= is also defined in docs/source/python_api.md and docs/source/cpp_api.md. This duplicate label will cause Sphinx build failures.

🔧 Proposed fix: Use unique label
-(api)=
+(c_api)=
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/c_api.md` at line 3, The label `(api)=` is duplicated and must be
made unique: rename the label in this file from `(api)=` to a unique identifier
such as `(c_api)=` (or another clear C API label) and update any local
cross-reference targets in this document that point to `(api)` to use the new
`(c_api)` label; ensure the new label does not collide with existing
`(python_api)=` or `(cpp_api)=` labels elsewhere.
docs/source/cpp_api.md (1)

3-3: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Duplicate label will cause Sphinx build errors.

The label (api)= is also defined in docs/source/python_api.md and docs/source/c_api.md. This duplicate label will cause Sphinx build failures.

🔧 Proposed fix: Use unique label
-(api)=
+(cpp_api)=
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cpp_api.md` at line 3, The duplicate Sphinx label `(api)=` must
be made unique in this file to avoid build errors: replace the `(api)=` label
with a distinct identifier (e.g. `(cpp-api)=`) and update any local references
in this file that point to that label (and if present, ensure other files do not
reference the old `(api)` target); verify cross-references such as :ref: or .. _
links that previously used `(api)` are updated to the new `(cpp-api)` label and
keep the label name consistent with the language-specific docs.
🟡 Minor comments (26)
docs/source/c_api/neighbors_hnsw_c.md-3-3 (1)

3-3: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify mutability: “immutable” contradicts documented extend APIs.

Line 3 says the loaded HNSW index is immutable, but Lines 24–37 document hnsw_c_extend_params and hnsw_c_index_extend. Please reconcile this wording (e.g., scope the immutability statement to a specific load path or remove it).

As per coding guidelines, "For documentation changes, focus on: Accuracy ... Clarity ... Consistency."

Also applies to: 24-37

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/c_api/neighbors_hnsw_c.md` at line 3, The doc text erroneously
calls the loaded HNSW index "immutable" while later sections document extend
APIs; update the wording to be consistent by either scoping immutability to a
specific load path or removing the blanket "immutable" claim and adding a note
about which load methods are mutable/immutable; specifically edit the opening
sentence and the descriptions around hnsw_c_extend_params and
hnsw_c_index_extend so they consistently state whether that load path supports
extension and any constraints (e.g., "immutable when loaded via X, mutable via
Y" or simply remove "immutable" and document extension behavior).
docs/source/integrations/kinetica.md-3-5 (1)

3-5: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Tighten wording and use descriptive link text.

Line 3 has a grammatical glitch (“graph-based the CAGRA algorithm”), and Line 5 uses non-descriptive link text (“here”). Both reduce clarity and scanability.

Suggested wording update
-Starting with release 7.2, Kinetica supports the graph-based the CAGRA algorithm from RAFT.
+Starting with release 7.2, Kinetica supports the graph-based CAGRA algorithm from RAFT.
@@
-Kinetica currently offers the ability to create a CAGRA index in a SQL `CREATE_TABLE` statement, as outlined in their [vector search indexing docs](https://docs.kinetica.com/7.2/concepts/indexes/#cagra-index). Kinetica is not open source, but the RAFT indexes can be enabled in the developer edition, which can be installed [here](https://www.kinetica.com/try/#download_instructions).
+Kinetica currently offers the ability to create a CAGRA index in a SQL `CREATE_TABLE` statement, as outlined in their [vector search indexing documentation](https://docs.kinetica.com/7.2/concepts/indexes/#cagra-index). Kinetica is not open source, but the RAFT indexes can be enabled in the developer edition, which can be installed via the [Kinetica developer edition download instructions](https://www.kinetica.com/try/#download_instructions).

As per coding guidelines, "docs/**/*: For documentation changes, focus on: Clarity: Flag confusing explanations, missing prerequisites, or unclear examples."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/integrations/kinetica.md` around lines 3 - 5, Fix the grammatical
glitch and replace the non-descriptive link text: change the phrase "graph-based
the CAGRA algorithm from RAFT" to a cleaner form such as "graph-based CAGRA
algorithm in RAFT" (or "from RAFT" if that is the intended relation) and replace
the link anchor "here" with descriptive text like "Kinetica developer edition
download instructions" or "Kinetica developer edition download page" so the link
is meaningful; update the sentences in docs/source/integrations/kinetica.md
around the phrases "graph-based the CAGRA algorithm" and the "here" link
accordingly to preserve the existing links and references.
docs/source/cpp_api/neighbors_cagra.md-3-3 (1)

3-3: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix hyphenation typo in algorithm description

Please change “state-of-the art” → “state-of-the-art” and “large-batch sized” → “large-batch-sized” for polished user-facing docs.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cpp_api/neighbors_cagra.md` at line 3, Update the phrase typos in
the CAGRA description by replacing “state-of-the art” with “state-of-the-art”
and “large-batch sized” with “large-batch-sized” in the text content (the
sentence containing "CAGRA is a graph-based nearest neighbors algorithm...
demonstrates state-of-the art index build and query performance for both small-
and large-batch sized search.").
docs/source/c_api/neighbors_mg.md-5-5 (1)

5-5: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use a single H1 and demote section titles to H2

This page has multiple # headings, which makes hierarchy inconsistent. Keep Line 1 as H1, and change later section headers to ## for stable navigation/TOC structure.

Suggested doc fix
-# Common Types and Enums
+## Common Types and Enums
...
-# Multi-GPU IVF-Flat
+## Multi-GPU IVF-Flat
...
-# Multi-GPU IVF-PQ
+## Multi-GPU IVF-PQ
...
-# Multi-GPU CAGRA
+## Multi-GPU CAGRA

Also applies to: 17-17, 95-95, 173-173

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/c_api/neighbors_mg.md` at line 5, The document uses multiple H1
headings; keep the first heading ("Common Types and Enums") as a single H1 and
change all subsequent top-level headings that start with "#" to "##" (i.e.,
demote any additional H1s to H2s throughout the page, for example the later
section titles currently using "#" at lines later in the file); ensure all
section titles after the first use "##" so the TOC and navigation remain
consistent.
docs/source/neighbors/ivfpq.md-48-48 (1)

48-48: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix typo in conservative_memory_allocation description.

“imtentionally” should be “intentionally”.

As per coding guidelines, “For documentation changes, focus on: Clarity: Flag confusing explanations...”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/neighbors/ivfpq.md` at line 48, Fix the typo in the
conservative_memory_allocation documentation: change "imtentionally" to
"intentionally" in the sentence describing IVF list overallocation for dynamic
indexes (the description referencing conservative_memory_allocation); verify the
corrected sentence reads smoothly and update any nearby wording for clarity if
necessary.
docs/source/python_api/neighbors_cagra.md-3-3 (1)

3-3: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix hyphenation typo in the CAGRA description.

“state-of-the art” should be “state-of-the-art” for correct phrasing.

As per coding guidelines, “For documentation changes, focus on: Clarity: Flag confusing explanations... and Consistency: ... terminology match code.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/python_api/neighbors_cagra.md` at line 3, Update the CAGRA
description sentence that currently reads "state-of-the art" to use the correct
hyphenation "state-of-the-art" in the markdown text (the line starting with
"CAGRA is a graph-based nearest neighbors...") so the phrasing is consistent and
clear across the docs.
docs/source/rust_api/index.md-4-4 (1)

4-4: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add an iframe title attribute for accessibility.

Line 4 should include a descriptive title so assistive tech can identify embedded content.

Suggested patch
-<iframe src="../_static/rust/cuvs/index.html" height="720px" width="100%"></iframe>
+<iframe
+  src="../_static/rust/cuvs/index.html"
+  title="cuVS Rust API Reference"
+  height="720px"
+  width="100%"
+></iframe>
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/rust_api/index.md` at line 4, The iframe in
docs/source/rust_api/index.md currently lacks an accessibility title; update the
iframe element (the <iframe ...> tag embedding ../_static/rust/cuvs/index.html)
to include a descriptive title attribute (e.g., title="Rust API documentation
viewer" or similar) so assistive technologies can identify the embedded content.
docs/source/working_with_ann_indexes_c.md-42-42 (1)

42-42: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing index build code.

The comment // ... build index ... suggests the index should be built before searching, but no code or reference is provided. Consider either:

  1. Adding actual build code (similar to lines 20-24), or
  2. Adding a cross-reference to the "Building an index" section above

As per coding guidelines: "Completeness: Check if API changes (parameters, return values, errors) are documented".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/working_with_ann_indexes_c.md` at line 42, The example is missing
the index build step referenced by the "// ... build index ..." comment; update
the example around DLManagedTensor *queries to either insert the actual
index-building code (similar to the earlier index creation lines used around
lines 20-24) so the search operates on a built index, or replace the placeholder
comment with a clear cross-reference to the "Building an index" section above;
ensure the instructions reference the index creation routine used elsewhere in
the doc so readers can follow a complete build-then-search flow.
docs/source/cuvs_bench/datasets.md-47-47 (1)

47-47: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix spelling error.

"ground trunth" should be "ground truth".

📝 Proposed fix
-If you have a dataset, but no corresponding ground truth file, then you can generate ground trunth using the `generate_groundtruth` utility. Example usage:
+If you have a dataset, but no corresponding ground truth file, then you can generate ground truth using the `generate_groundtruth` utility. Example usage:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cuvs_bench/datasets.md` at line 47, Fix the spelling mistake in
the documentation sentence that reads "ground trunth" by updating it to "ground
truth"; specifically edit the line mentioning the `generate_groundtruth` utility
so the sentence becomes "If you have a dataset, but no corresponding ground
truth file, then you can generate ground truth using the `generate_groundtruth`
utility." and keep the `generate_groundtruth` identifier unchanged.
docs/source/choosing_and_configuring_indexes.md-44-44 (1)

44-44: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Replace informal wording in a technical sentence.

Use “anyway” instead of “anyways” to keep tone consistent across docs.

As per coding guidelines, documentation changes should prioritize clarity and consistency.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/choosing_and_configuring_indexes.md` at line 44, In the IVF-PQ
paragraph (mentions IVF-PQ, n_lists, n_probes, k*2) replace the informal word
"anyways" with the grammatically correct "anyway" to maintain consistent
technical tone in docs/source/choosing_and_configuring_indexes.md; no other
changes to the surrounding explanation about product quantization, refinement
(k*2), or keeping raw vectors are needed.
docs/source/cuvs_bench/wiki_all_dataset.md-18-18 (1)

18-18: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix typo in download instructions.

“all 10 the parts” should be “all 10 parts”.

As per coding guidelines, documentation changes should prioritize clarity.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cuvs_bench/wiki_all_dataset.md` at line 18, Update the sentence
in the docs that currently reads "The following will download all 10 the parts
and untar them to a `wiki_all_88M` directory:" by removing the extra "the" so it
reads "The following will download all 10 parts and untar them to a
`wiki_all_88M` directory:"; ensure the change is applied in the
wiki_all_dataset.md content where that sentence appears for clarity.
docs/source/tuning_guide.md-13-13 (1)

13-13: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix two sentence-level clarity errors.

Line 13 ends with a fragment (...Optuna(...). to verify this.), and Line 43 is missing a verb (Optionally, repeat the above steps...). These read as editing artifacts and should be corrected before merge.

As per coding guidelines, documentation changes should prioritize clarity.

Also applies to: 43-43

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/tuning_guide.md` at line 13, Fix two sentence-level clarity
issues in the tuning guide: (1) the sentence containing the Optuna link that
currently ends with the fragment "Optuna(...). to verify this." — merge or
rephrase so the clause is complete (e.g., remove the stray period and attach the
purpose phrase so it reads as one sentence offering HPO tools to verify
results). (2) the sentence starting "Optionally, repeat the above steps..." is
missing a verb — add an appropriate verb phrase (for example "repeat the above
steps to confirm results" or "repeat the above steps to refine parameters") so
the sentence is grammatically complete. Target the sentence with the Optuna link
and the sentence beginning "Optionally, repeat the above steps..." in
tuning_guide.md.
docs/source/neighbors/vamana.md-3-3 (1)

3-3: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix spelling errors in key technical descriptions.

There are several typos in changed content (accelreate, idnexes, repositoriy, degre) that should be corrected for readability and credibility.

As per coding guidelines, documentation changes should prioritize clarity and consistency.

Also applies to: 17-17, 32-32

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/neighbors/vamana.md` at line 3, The documentation contains
multiple typos that reduce clarity; update the VAMANA/cuVS text to correct
spelling and maintain consistency: change "accelreate" to "accelerate",
"idnexes" to "indexes" (or "indices" if you prefer consistent terminology across
the doc), "repositoriy" to "repository", and "degre" to "degree", and apply the
same corrections at the other affected locations referenced (lines 17 and 32) so
VAMANA, DiskANN and cuVS references remain consistent and readable.
docs/source/api_basics.md-14-14 (1)

14-14: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use descriptive link text instead of “here”.

Line 14 and Line 28 use generic link text, which hurts readability/accessibility in long docs.

✏️ Proposed fix
-Here's an example of configuring RMM to use a pool allocator in C++ (derived from the RMM example [here](https://github.com/rapidsai/rmm?tab=readme-ov-file#example)):
+Here's an example of configuring RMM to use a pool allocator in C++ (derived from the [RMM C++ pool allocator example](https://github.com/rapidsai/rmm?tab=readme-ov-file#example)):

-And the corresponding code in Python (derived from the RMM example [here](https://github.com/rapidsai/rmm?tab=readme-ov-file#memoryresource-objects)):
+And the corresponding code in Python (derived from the [RMM Python memory resource example](https://github.com/rapidsai/rmm?tab=readme-ov-file#memoryresource-objects)):

As per coding guidelines, "For documentation changes, focus on: Clarity: Flag confusing explanations, missing prerequisites, or unclear examples."

Also applies to: 28-28

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/api_basics.md` at line 14, Replace the generic link text "here"
in the two spots that reference the RMM example with descriptive link text that
explains the target (e.g., "RMM README example for pool allocator" or "RMM
example: pool allocator usage") so readers know what the link points to; update
both occurrences that currently read 'here' in the sentence mentioning the RMM
example and ensure the anchor text is clear and accessible while keeping the
existing URL unchanged.
docs/source/api_interoperability.md-102-102 (1)

102-102: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove or populate the empty “CUDA array interface (Python)” section.

Line 102 introduces a section header with no content, which leaves a dead end in the docs.

As per coding guidelines, "For documentation changes, focus on: Completeness: Check if API changes (parameters, return values, errors) are documented."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/api_interoperability.md` at line 102, The empty "CUDA array
interface (Python)" section should be removed or populated; update the "CUDA
array interface (Python)" header by either deleting it if unsupported, or add
concise documentation describing support for the __cuda_array_interface__
protocol including expected object shape/dtype semantics, how to convert to/from
library arrays (mention any functions or constructors that accept
__cuda_array_interface__ objects), example usage, and any errors/limitations to
expect so the API changes are fully documented.
docs/source/cuvs_bench/index.md-90-90 (1)

90-90: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix duplicate Python version entry in the support matrix.

Line 90 lists Python 3.11 twice, which looks like an accidental duplication.

As per coding guidelines, "For documentation changes, focus on: Consistency: Version numbers, parameter types, and terminology match code."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cuvs_bench/index.md` at line 90, The "Supported Python versions"
line contains a duplicated entry; edit the line that currently reads "Supported
Python versions: 3.11, 3.11, 3.13, and 3.14" to remove the extra "3.11" so it
reads "Supported Python versions: 3.11, 3.13, and 3.14", and verify the version
list under the "Supported Python versions" header matches the project's actual
supported versions elsewhere in the docs.
docs/source/cuvs_bench/param_tuning.md-419-419 (1)

419-419: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix parameter description typo (“Intermadiate” → “Intermediate”).

Line 419 has a spelling error in a parameter description, which makes the table look inconsistent.

✏️ Proposed fix
-  - Intermadiate graph degree during nn-descent iterations
+  - Intermediate graph degree during nn-descent iterations

As per coding guidelines, "For documentation changes, focus on: Clarity: Flag confusing explanations, missing prerequisites, or unclear examples."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cuvs_bench/param_tuning.md` at line 419, Fix the spelling typo in
the parameter description that currently reads "Intermadiate graph degree during
nn-descent iterations" by changing "Intermadiate" to "Intermediate" so the
description reads "Intermediate graph degree during nn-descent iterations";
locate the exact string in the param_tuning documentation (the description text
for the nn-descent degree parameter) and update it accordingly.
docs/source/api_basics.md-45-46 (1)

45-46: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix malformed Markdown heading for the C section.

Line 45–46 uses RST-style heading markup (C + ^), which won’t render as a Markdown section header.

✏️ Proposed fix
-C
-^
+### C

As per coding guidelines, "For documentation changes, focus on: Clarity: Flag confusing explanations, missing prerequisites, or unclear examples."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/api_basics.md` around lines 45 - 46, The docs use RST-style
heading markup (“C” followed by “^”) instead of a Markdown header; replace that
RST underline with a proper Markdown heading (e.g., prefix the section title "C"
with one or more # characters) so the "C" section renders correctly; update the
heading text where "C" and the caret line appear to a Markdown header like "# C"
(or appropriate level) and remove the underline.
docs/source/cuvs_bench/index.md-27-29 (1)

27-29: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Replace leftover RST link syntax in the Markdown TOC.

Line 27 and Line 29 still use `...`_ references, which do not work as standard Markdown links.

✏️ Proposed fix
-  * `End-to-end: smaller-scale benchmarks (<1M to 10M)`_
-  * `End-to-end: large-scale benchmarks (>10M vectors)`_
+  * [End-to-end: smaller-scale benchmarks (<1M to 10M)](`#end-to-end-smaller-scale-benchmarks-1m-to-10m`)
+  * [End-to-end: large-scale benchmarks (>10M vectors)](`#end-to-end-large-scale-benchmarks-10m-vectors`)

As per coding guidelines, "For documentation changes, focus on: Clarity: Flag confusing explanations, missing prerequisites, or unclear examples."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cuvs_bench/index.md` around lines 27 - 29, Replace the leftover
reStructuredText link syntax in the Markdown TOC by changing the two list items
"`End-to-end: smaller-scale benchmarks (<1M to 10M)`_" and "`End-to-end:
large-scale benchmarks (>10M vectors)`_" to proper Markdown (e.g., plain text or
standard Markdown links like [End-to-end: smaller-scale benchmarks (<1M to
10M)](...) and [End-to-end: large-scale benchmarks (>10M vectors)](...)); update
the two occurrences so they render correctly in Markdown and point to the
intended anchors or files.
docs/source/neighbors/cagra.md-11-13 (1)

11-13: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix algorithm name typo in CAGRA build description.

“I-force” appears to be a typo for “brute-force”, which is confusing in a core algorithm explanation.

As per coding guidelines, documentation changes should prioritize Clarity and Consistency.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/neighbors/cagra.md` around lines 11 - 13, Replace the typo
"I-force" with "brute-force" in the CAGRA build description so the sentence
reads that brute-force can be used to construct the initial kNN graph; update
the surrounding phrasing in the same paragraph (the one mentioning "cuVS
provides IVF-PQ and NN-Descent strategies for building the initial kNN graph")
to ensure consistency and clarity with the corrected term.
docs/source/getting_started.md-64-76 (1)

64-76: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix capitalization and sentence/brand formatting in user-facing copy.

There are multiple clarity issues: lowercase sentence start (“our”), missing period after “Apache 2.0”, and “Github” should be “GitHub”. Please clean these in one pass to keep tone consistent across the landing page.

As per coding guidelines, documentation changes should prioritize Clarity and Consistency (“Flag confusing explanations… and terminology match code”).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/getting_started.md` around lines 64 - 76, Fix capitalization and
punctuation in the user-facing copy: capitalize the sentence-start "our
{doc}`API basics <api_basics>`" to "Our {doc}`API basics <api_basics>`", add a
period after "licensed under Apache 2.0" so it reads "licensed under Apache
2.0.", and change "Github" to the correct brand casing "GitHub" in the sentence
that points readers to the project repository; ensure these edits keep the
existing doc roles/links intact and maintain consistent sentence capitalization
across the paragraph.
docs/source/neighbors/ivfflat.md-20-20 (1)

20-20: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Correct filtering explanation typos that change meaning.

This paragraph has several errors (e.g., “applid”, “proved”) and punctuation breaks that make the filtering behavior hard to follow; “proved” appears to be “probed” in this context.

As per coding guidelines, documentation changes should prioritize Clarity and Accuracy.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/neighbors/ivfflat.md` at line 20, Fix typos and clarify the IVF
filtering explanation in docs/source/neighbors/ivfflat.md: replace “applid” with
“applied”, change “proved” to “probed”, fix punctuation and sentence flow so the
example reads clearly (e.g., “For example, imagine you have 3 IVF lists each
containing 2 vectors; if you probe only the closest 2 lists but filter out all
but 1 element, a remaining element located in an unprobed list will not be
considered”), and generally reword the paragraph for clear, accurate description
of how IVF methods apply filters only to probed lists.
docs/source/integrations/lucene.md-5-5 (1)

5-5: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Correct the capitalization of GitHub.

The official name is "GitHub" (capital H), not "Github".

📝 Proposed fix
-Install and evaluate the `lucene-cuvs` connector on [Github](https://github.com/SearchScale/lucene-cuvs).
+Install and evaluate the `lucene-cuvs` connector on [GitHub](https://github.com/SearchScale/lucene-cuvs).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/integrations/lucene.md` at line 5, Update the capitalization of
the GitHub brand name in the markdown line that mentions the lucene-cuvs
connector (the text containing "lucene-cuvs" and the link
"https://github.com/SearchScale/lucene-cuvs") by changing "Github" to "GitHub"
so the sentence reads "Install and evaluate the `lucene-cuvs` connector on
GitHub (https://github.com/SearchScale/lucene-cuvs)."
docs/source/index.md-63-63 (1)

63-63: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Hyphenate the compound adjective for readability.

Line 63 should use “high-performance” before “machine learning primitives.”

✏️ Suggested edit
-cuVS is built on top of the RAPIDS RAFT library of high performance machine learning primitives and provides all the necessary routines for vector search and clustering on the GPU.
+cuVS is built on top of the RAPIDS RAFT library of high-performance machine learning primitives and provides all the necessary routines for vector search and clustering on the GPU.

As per coding guidelines, “For documentation changes, focus on: Clarity: Flag confusing explanations …”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/index.md` at line 63, Update the sentence that currently reads
"high performance machine learning primitives" to use the hyphenated compound
adjective "high-performance" so it becomes "high-performance machine learning
primitives"; locate the sentence containing "cuVS is built on top of the RAPIDS
RAFT library of high performance machine learning primitives" and replace "high
performance" with "high-performance" to improve readability.
docs/source/index.md-7-7 (1)

7-7: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove the unused link reference definition.

Line 7 defines cuvs_reference but nothing references it, which can trigger markdown lint noise.

🧹 Suggested edit
-[cuvs_reference]: https://docs.rapids.ai/api/cuvs/stable/

As per coding guidelines, “For documentation changes, focus on: Clarity … and Consistency: Version numbers, parameter types, and terminology match code.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/index.md` at line 7, Remove the unused markdown link reference
`cuvs_reference` from the file by deleting the definition line
“[cuvs_reference]: https://docs.rapids.ai/api/cuvs/stable/” since nothing
references it and it triggers lint noise; ensure no other references to
`cuvs_reference` remain in the doc so the change won’t break cross-links.
docs/source/integrations.md-3-3 (1)

3-3: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Polish integration intro wording and acronym casing.

Line 3 should use SDK (not sdk) and read more clearly; current phrasing is slightly awkward.

✏️ Suggested edit
-Aside from using cuVS standalone, it can be consumed through a number of sdk and vector database integrations.
+Besides using cuVS as a standalone library, it can also be consumed through a number of SDK and vector database integrations.

As per coding guidelines, “For documentation changes, focus on: Clarity … and Consistency: … terminology match code.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/integrations.md` at line 3, Update the intro sentence that
currently reads "Aside from using cuVS standalone, it can be consumed through a
number of sdk and vector database integrations." to use proper acronym casing
and clearer phrasing: replace "sdk" with "SDK" and reword to something like
"Aside from using cuVS as a standalone product, it can be consumed via a number
of SDKs and vector database integrations." Locate the sentence by searching for
the phrase "Aside from using cuVS standalone" and update it to the new wording
to improve clarity and consistency.
🧹 Nitpick comments (5)
docs/source/cpp_api/neighbors_bruteforce.md (1)

3-3: ⚡ Quick win

Tighten wording for clarity in the intro sentence.

“The bruteforce method is running the KNN algorithm” reads awkwardly. Consider “The bruteforce method performs exhaustive KNN search and returns exact results.”

As per coding guidelines, "Clarity: Flag confusing explanations, missing prerequisites, or unclear examples".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cpp_api/neighbors_bruteforce.md` at line 3, Replace the awkward
intro sentence "The bruteforce method is running the KNN algorithm." with a
clearer phrasing—e.g., change it to "The bruteforce method performs exhaustive
KNN search and returns exact results."—so the document (neighbors_bruteforce.md)
uses concise, unambiguous wording describing that bruteforce does exhaustive
search and yields exact neighbors.
docs/source/integrations/faiss.md (1)

5-5: ⚡ Quick win

Use a version-pinned Faiss install link to avoid doc drift.

Line 5 points to main/INSTALL.md, which can diverge from the stated version context. Prefer a tagged link.

Suggested patch
-The cuVS backend can be enabled by setting the appropriate cmake flag while building Faiss from source. A pre-compiled conda package can also be installed. Refer to [Faiss installation guidelines](https://github.com/facebookresearch/faiss/blob/main/INSTALL.md) for more information.
+The cuVS backend can be enabled by setting the appropriate cmake flag while building Faiss from source. A pre-compiled conda package can also be installed. Refer to [Faiss installation guidelines](https://github.com/facebookresearch/faiss/blob/v1.10.0/INSTALL.md) for more information.

As per coding guidelines, "Consistency: Version numbers, parameter types, and terminology match code".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/integrations/faiss.md` at line 5, Update the Faiss installation
link in docs/source/integrations/faiss.md by replacing the floating main branch
URL (https://github.com/facebookresearch/faiss/blob/main/INSTALL.md) with a
version-pinned tag URL that matches the Faiss version used by the project (use
the corresponding released tag/commit for that version from your dependency
manifest or lockfile); ensure the new link points to the tagged INSTALL.md for
that release so the docs won't drift as Faiss main changes.
docs/source/cpp_api/cluster_agglomerative.md (1)

17-19: 💤 Low value

Consider removing redundant header/namespace declarations.

The header include and namespace are already declared at lines 5-7. Since both doxygengroup blocks refer to the same header and namespace, the repetition at lines 17-19 is unnecessary and could be streamlined for clarity.

♻️ Proposed refactor
 ## Agglomerative
 
-`#include <cuvs/cluster/agglomerative.hpp>`
-
-namespace *cuvs::cluster::agglomerative*
-
 ```{doxygengroup} single_linkage
 :project: cuvs
 :members:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cpp_api/cluster_agglomerative.md` around lines 17 - 19, Remove
the redundant header include `<cuvs/cluster/agglomerative.hpp>` and duplicate
namespace declaration `namespace cuvs::cluster::agglomerative` at lines 17-19
and consolidate the doxygengroup blocks so both refer to the already-declared
header/namespace (e.g., merge into a single `{doxygengroup} single_linkage
:project: cuvs :members:` block) to eliminate repetition and streamline the
documentation.
docs/source/neighbors/cagra.md (1)

95-95: ⚡ Quick win

Use a level-2 heading for “Memory footprint” to preserve document hierarchy.

This page already has an H1 at Line 1; making this section ## Memory footprint keeps structure consistent for navigation and rendering.

As per coding guidelines, documentation changes should prioritize Clarity and Consistency.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/neighbors/cagra.md` at line 95, Change the "Memory footprint"
heading from an H1 to an H2 (i.e., replace "# Memory footprint" with "## Memory
footprint") so the section preserves the document hierarchy and remains
consistent with the existing top-level H1; update the heading string "Memory
footprint" accordingly.
docs/source/neighbors/neighbors.md (1)

15-19: ⚡ Quick win

Avoid a second H1 in the same page.

Please change # Indices and tables to ## Indices and tables for consistent document structure.

As per coding guidelines, documentation changes should prioritize Consistency.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/neighbors/neighbors.md` around lines 15 - 19, Change the
second-level heading text from a top-level H1 to H2 by replacing the heading
string "# Indices and tables" with "## Indices and tables" in the documentation
file so the page does not contain two H1s; update the heading token "Indices and
tables" accordingly to maintain consistent document structure and references to
genindex/modindex/search.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 80eb4684-dd4a-40e6-b0e2-849a23ac4d2f

📥 Commits

Reviewing files that changed from the base of the PR and between 7cf69cb and 35c698e.

📒 Files selected for processing (179)
  • ci/release/update-version.sh
  • conda/environments/all_cuda-129_arch-aarch64.yaml
  • conda/environments/all_cuda-129_arch-x86_64.yaml
  • conda/environments/all_cuda-131_arch-aarch64.yaml
  • conda/environments/all_cuda-131_arch-x86_64.yaml
  • dependencies.yaml
  • docs/source/advanced_topics.md
  • docs/source/advanced_topics.rst
  • docs/source/api_basics.md
  • docs/source/api_basics.rst
  • docs/source/api_docs.md
  • docs/source/api_docs.rst
  • docs/source/api_interoperability.md
  • docs/source/api_interoperability.rst
  • docs/source/build.md
  • docs/source/build.rst
  • docs/source/c_api.md
  • docs/source/c_api.rst
  • docs/source/c_api/cluster.md
  • docs/source/c_api/cluster.rst
  • docs/source/c_api/cluster_kmeans_c.md
  • docs/source/c_api/cluster_kmeans_c.rst
  • docs/source/c_api/core_c_api.md
  • docs/source/c_api/core_c_api.rst
  • docs/source/c_api/distance.md
  • docs/source/c_api/distance.rst
  • docs/source/c_api/neighbors.md
  • docs/source/c_api/neighbors.rst
  • docs/source/c_api/neighbors_all_neighbors_c.md
  • docs/source/c_api/neighbors_all_neighbors_c.rst
  • docs/source/c_api/neighbors_bruteforce_c.md
  • docs/source/c_api/neighbors_bruteforce_c.rst
  • docs/source/c_api/neighbors_cagra_c.md
  • docs/source/c_api/neighbors_cagra_c.rst
  • docs/source/c_api/neighbors_hnsw_c.md
  • docs/source/c_api/neighbors_hnsw_c.rst
  • docs/source/c_api/neighbors_ivf_flat_c.md
  • docs/source/c_api/neighbors_ivf_flat_c.rst
  • docs/source/c_api/neighbors_ivf_pq_c.md
  • docs/source/c_api/neighbors_ivf_pq_c.rst
  • docs/source/c_api/neighbors_mg.md
  • docs/source/c_api/neighbors_mg.rst
  • docs/source/c_api/neighbors_vamana_c.md
  • docs/source/c_api/neighbors_vamana_c.rst
  • docs/source/c_api/preprocessing.md
  • docs/source/c_api/preprocessing.rst
  • docs/source/choosing_and_configuring_indexes.md
  • docs/source/comparing_indexes.md
  • docs/source/conf.py
  • docs/source/cpp_api.md
  • docs/source/cpp_api.rst
  • docs/source/cpp_api/cluster.md
  • docs/source/cpp_api/cluster.rst
  • docs/source/cpp_api/cluster_agglomerative.md
  • docs/source/cpp_api/cluster_agglomerative.rst
  • docs/source/cpp_api/cluster_kmeans.md
  • docs/source/cpp_api/cluster_kmeans.rst
  • docs/source/cpp_api/cluster_spectral.md
  • docs/source/cpp_api/cluster_spectral.rst
  • docs/source/cpp_api/distance.md
  • docs/source/cpp_api/distance.rst
  • docs/source/cpp_api/neighbors.md
  • docs/source/cpp_api/neighbors.rst
  • docs/source/cpp_api/neighbors_all_neighbors.md
  • docs/source/cpp_api/neighbors_all_neighbors.rst
  • docs/source/cpp_api/neighbors_bruteforce.md
  • docs/source/cpp_api/neighbors_bruteforce.rst
  • docs/source/cpp_api/neighbors_cagra.md
  • docs/source/cpp_api/neighbors_cagra.rst
  • docs/source/cpp_api/neighbors_dynamic_batching.md
  • docs/source/cpp_api/neighbors_dynamic_batching.rst
  • docs/source/cpp_api/neighbors_epsilon_neighborhood.md
  • docs/source/cpp_api/neighbors_filter.md
  • docs/source/cpp_api/neighbors_filter.rst
  • docs/source/cpp_api/neighbors_hnsw.md
  • docs/source/cpp_api/neighbors_hnsw.rst
  • docs/source/cpp_api/neighbors_ivf_flat.md
  • docs/source/cpp_api/neighbors_ivf_flat.rst
  • docs/source/cpp_api/neighbors_ivf_pq.md
  • docs/source/cpp_api/neighbors_ivf_pq.rst
  • docs/source/cpp_api/neighbors_mg.md
  • docs/source/cpp_api/neighbors_mg.rst
  • docs/source/cpp_api/neighbors_nn_descent.md
  • docs/source/cpp_api/neighbors_nn_descent.rst
  • docs/source/cpp_api/neighbors_refine.md
  • docs/source/cpp_api/neighbors_refine.rst
  • docs/source/cpp_api/neighbors_vamana.md
  • docs/source/cpp_api/neighbors_vamana.rst
  • docs/source/cpp_api/preprocessing.md
  • docs/source/cpp_api/preprocessing.rst
  • docs/source/cpp_api/preprocessing_pca.md
  • docs/source/cpp_api/preprocessing_pca.rst
  • docs/source/cpp_api/preprocessing_quantize.md
  • docs/source/cpp_api/preprocessing_quantize.rst
  • docs/source/cpp_api/preprocessing_spectral_embedding.md
  • docs/source/cpp_api/preprocessing_spectral_embedding.rst
  • docs/source/cpp_api/selection.md
  • docs/source/cpp_api/selection.rst
  • docs/source/cpp_api/stats.md
  • docs/source/cpp_api/stats.rst
  • docs/source/cuvs_bench/build.md
  • docs/source/cuvs_bench/datasets.md
  • docs/source/cuvs_bench/index.md
  • docs/source/cuvs_bench/index.rst
  • docs/source/cuvs_bench/param_tuning.md
  • docs/source/cuvs_bench/param_tuning.rst
  • docs/source/cuvs_bench/pluggable_backend.md
  • docs/source/cuvs_bench/pluggable_backend.rst
  • docs/source/cuvs_bench/wiki_all_dataset.md
  • docs/source/filtering.md
  • docs/source/filtering.rst
  • docs/source/getting_started.md
  • docs/source/getting_started.rst
  • docs/source/index.md
  • docs/source/integrations.md
  • docs/source/integrations.rst
  • docs/source/integrations/faiss.md
  • docs/source/integrations/kinetica.md
  • docs/source/integrations/kinetica.rst
  • docs/source/integrations/lucene.md
  • docs/source/integrations/milvus.md
  • docs/source/neighbors/all_neighbors.md
  • docs/source/neighbors/bruteforce.md
  • docs/source/neighbors/cagra.md
  • docs/source/neighbors/cagra.rst
  • docs/source/neighbors/ivfflat.md
  • docs/source/neighbors/ivfflat.rst
  • docs/source/neighbors/ivfpq.md
  • docs/source/neighbors/ivfpq.rst
  • docs/source/neighbors/neighbors.md
  • docs/source/neighbors/neighbors.rst
  • docs/source/neighbors/vamana.md
  • docs/source/python_api.md
  • docs/source/python_api.rst
  • docs/source/python_api/cluster.md
  • docs/source/python_api/cluster.rst
  • docs/source/python_api/cluster_kmeans.md
  • docs/source/python_api/cluster_kmeans.rst
  • docs/source/python_api/distance.md
  • docs/source/python_api/distance.rst
  • docs/source/python_api/neighbors.md
  • docs/source/python_api/neighbors.rst
  • docs/source/python_api/neighbors_all_neighbors.md
  • docs/source/python_api/neighbors_all_neighbors.rst
  • docs/source/python_api/neighbors_brute_force.md
  • docs/source/python_api/neighbors_brute_force.rst
  • docs/source/python_api/neighbors_cagra.md
  • docs/source/python_api/neighbors_cagra.rst
  • docs/source/python_api/neighbors_hnsw.md
  • docs/source/python_api/neighbors_hnsw.rst
  • docs/source/python_api/neighbors_ivf_flat.md
  • docs/source/python_api/neighbors_ivf_flat.rst
  • docs/source/python_api/neighbors_ivf_pq.md
  • docs/source/python_api/neighbors_ivf_pq.rst
  • docs/source/python_api/neighbors_mg_cagra.md
  • docs/source/python_api/neighbors_mg_cagra.rst
  • docs/source/python_api/neighbors_mg_ivf_flat.md
  • docs/source/python_api/neighbors_mg_ivf_flat.rst
  • docs/source/python_api/neighbors_mg_ivf_pq.md
  • docs/source/python_api/neighbors_mg_ivf_pq.rst
  • docs/source/python_api/neighbors_multi_gpu.md
  • docs/source/python_api/neighbors_nn_decent.md
  • docs/source/python_api/neighbors_nn_decent.rst
  • docs/source/python_api/preprocessing.md
  • docs/source/python_api/preprocessing.rst
  • docs/source/rust_api/index.md
  • docs/source/rust_api/index.rst
  • docs/source/tuning_guide.md
  • docs/source/vector_databases_vs_vector_search.md
  • docs/source/working_with_ann_indexes.md
  • docs/source/working_with_ann_indexes.rst
  • docs/source/working_with_ann_indexes_c.md
  • docs/source/working_with_ann_indexes_c.rst
  • docs/source/working_with_ann_indexes_cpp.md
  • docs/source/working_with_ann_indexes_cpp.rst
  • docs/source/working_with_ann_indexes_python.md
  • docs/source/working_with_ann_indexes_python.rst
  • docs/source/working_with_ann_indexes_rust.md
  • docs/source/working_with_ann_indexes_rust.rst
💤 Files with no reviewable changes (78)
  • docs/source/c_api/distance.rst
  • docs/source/c_api/core_c_api.rst
  • docs/source/python_api.rst
  • docs/source/cpp_api/neighbors_filter.rst
  • docs/source/working_with_ann_indexes_python.rst
  • docs/source/python_api/neighbors_ivf_pq.rst
  • docs/source/python_api/distance.rst
  • docs/source/neighbors/ivfpq.rst
  • docs/source/python_api/neighbors_all_neighbors.rst
  • docs/source/cpp_api/preprocessing_spectral_embedding.rst
  • docs/source/neighbors/ivfflat.rst
  • docs/source/cpp_api/cluster_spectral.rst
  • docs/source/cpp_api/stats.rst
  • docs/source/c_api.rst
  • docs/source/working_with_ann_indexes_cpp.rst
  • docs/source/cpp_api/preprocessing.rst
  • docs/source/c_api/neighbors_all_neighbors_c.rst
  • docs/source/c_api/preprocessing.rst
  • docs/source/cpp_api/cluster.rst
  • docs/source/working_with_ann_indexes_rust.rst
  • docs/source/python_api/neighbors_mg_cagra.rst
  • docs/source/python_api/cluster.rst
  • docs/source/cpp_api/preprocessing_pca.rst
  • docs/source/cpp_api/neighbors_mg.rst
  • docs/source/working_with_ann_indexes.rst
  • docs/source/c_api/neighbors_cagra_c.rst
  • docs/source/python_api/neighbors_nn_decent.rst
  • docs/source/cpp_api/selection.rst
  • docs/source/working_with_ann_indexes_c.rst
  • docs/source/build.rst
  • docs/source/cpp_api.rst
  • docs/source/integrations/kinetica.rst
  • docs/source/python_api/neighbors_cagra.rst
  • docs/source/c_api/neighbors_bruteforce_c.rst
  • docs/source/cpp_api/neighbors_dynamic_batching.rst
  • docs/source/python_api/neighbors.rst
  • docs/source/c_api/neighbors_ivf_pq_c.rst
  • docs/source/python_api/neighbors_brute_force.rst
  • docs/source/python_api/cluster_kmeans.rst
  • docs/source/cuvs_bench/pluggable_backend.rst
  • docs/source/cpp_api/cluster_agglomerative.rst
  • docs/source/cpp_api/neighbors_cagra.rst
  • docs/source/neighbors/neighbors.rst
  • docs/source/c_api/cluster_kmeans_c.rst
  • docs/source/c_api/neighbors_mg.rst
  • docs/source/cpp_api/neighbors_nn_descent.rst
  • docs/source/cpp_api/neighbors_ivf_flat.rst
  • docs/source/api_interoperability.rst
  • docs/source/cpp_api/neighbors_all_neighbors.rst
  • docs/source/api_docs.rst
  • docs/source/advanced_topics.rst
  • docs/source/c_api/cluster.rst
  • docs/source/cpp_api/preprocessing_quantize.rst
  • docs/source/cpp_api/neighbors.rst
  • docs/source/cuvs_bench/index.rst
  • docs/source/neighbors/cagra.rst
  • docs/source/rust_api/index.rst
  • docs/source/integrations.rst
  • docs/source/cpp_api/neighbors_bruteforce.rst
  • docs/source/cuvs_bench/param_tuning.rst
  • docs/source/c_api/neighbors.rst
  • docs/source/python_api/neighbors_mg_ivf_flat.rst
  • docs/source/cpp_api/neighbors_refine.rst
  • docs/source/cpp_api/distance.rst
  • docs/source/c_api/neighbors_ivf_flat_c.rst
  • docs/source/c_api/neighbors_vamana_c.rst
  • docs/source/python_api/neighbors_ivf_flat.rst
  • docs/source/python_api/preprocessing.rst
  • docs/source/cpp_api/cluster_kmeans.rst
  • docs/source/cpp_api/neighbors_hnsw.rst
  • docs/source/python_api/neighbors_mg_ivf_pq.rst
  • docs/source/cpp_api/neighbors_ivf_pq.rst
  • docs/source/c_api/neighbors_hnsw_c.rst
  • docs/source/getting_started.rst
  • docs/source/api_basics.rst
  • docs/source/cpp_api/neighbors_vamana.rst
  • docs/source/python_api/neighbors_hnsw.rst
  • docs/source/filtering.rst

Comment on lines +12 to +24
#include <dlpack/dlpack.h>

// Create data representation in host memory
float dataset[2][1] = {{0.2, 0.1}};
// copy data to device memory
float *dataset_dev;
cuvsRMMAlloc(&dataset_dev, sizeof(float) * 2 * 1);
cudaMemcpy(dataset_dev, dataset, sizeof(float) * 2 * 1, cudaMemcpyDefault);

// Use DLPack for representing the data as a tensor
DLManagedTensor dataset_tensor;
dataset_tensor.dl_tensor.data = dataset;
dataset_tensor.dl_tensor.device.device_type = kDLCUDA;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

For DLPack DLManagedTensor, when dl_tensor.device.device_typeiskDLCUDA, should dl_tensor.datapoint to device memory? Also, what headers are required forcudaMemcpy and cuVS C API allocation calls?

💡 Result:

  1. For DLPack DLManagedTensor, when dl_tensor.device.device_type == kDLCUDA, should dl_tensor.data point to device memory? Yes. The DLPack specification says the DLTensor “data pointer points to the allocated data” and that for CUDA it “will be a CUDA device pointer” (and includes byte_offset for pointing at the beginning of the data). [1][2] 2) What headers are required for cudaMemcpy and cuVS C API allocation calls? - For cudaMemcpy (CUDA Runtime API), include the CUDA runtime header: #include <cuda_runtime.h>. [3] - For cuVS C API calls (including allocation/resources APIs), include cuVS’s C API header alongside the CUDA runtime header. cuVS examples show: #include <cuda_runtime.h> and #include <cuvs/core/c_api.h>. [3] (Those cuVS docs show both headers together in a minimal snippet, which implies cuvs C API functions are declared by <cuvs/core/c_api.h>, while cudaMemcpy-related declarations come from <cuda_runtime.h>.) [3]

Citations:


Fix the DLPack code example: incorrect device pointer and missing headers.

The example sets device_type to kDLCUDA but assigns dl_tensor.data to the host array (dataset) instead of the device pointer (dataset_dev). Per the DLPack specification, when the device type is CUDA, the data pointer must reference device memory. Additionally, the code uses cudaMemcpy and cuvsRMMAlloc without the required headers.

Proposed changes
 `#include` <dlpack/dlpack.h>
+#include <cuda_runtime.h>
+#include <cuvs/core/c_api.h>
 
 // Create data representation in host memory
 float dataset[2][1] = {{0.2, 0.1}};
@@ -23,7 +25,7 @@
 // Use DLPack for representing the data as a tensor
 DLManagedTensor dataset_tensor;
-dataset_tensor.dl_tensor.data               = dataset;
+dataset_tensor.dl_tensor.data               = dataset_dev;
 dataset_tensor.dl_tensor.device.device_type = kDLCUDA;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/api_interoperability.md` around lines 12 - 24, The DLPack example
currently sets DLManagedTensor.dl_tensor.data to the host buffer while marking
device_type = kDLCUDA and is missing required headers; change dl_tensor.data to
point to the device pointer dataset_dev (not dataset), set
dl_tensor.device.device_id appropriately (e.g., 0) and ensure
dl_tensor.device.device_type remains kDLCUDA, and add the missing includes that
declare cudaMemcpy and cuvsRMMAlloc (e.g., include <cuda_runtime.h> and the
header that provides cuvsRMMAlloc) so the calls compile and the pointer matches
the CUDA device memory.

Comment thread docs/source/build.md
Comment on lines +162 to +163
./build.sh libcuvs tests -n --limit-tests=NEIGHBORS_TEST;CAGRA_C_TEST
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -n -- '--limit-tests=.*;.*' docs/source/build.md

Repository: rapidsai/cuvs

Length of output: 131


🏁 Script executed:

cat -n docs/source/build.md | sed -n '155,170p'

Repository: rapidsai/cuvs

Length of output: 734


🏁 Script executed:

fd -name "build.sh" -type f

Repository: rapidsai/cuvs

Length of output: 288


🏁 Script executed:

rg --context=5 --no-heading -n -- '--limit-tests=' docs/

Repository: rapidsai/cuvs

Length of output: 865


🏁 Script executed:

find . -name "build.sh" -type f | head -5

Repository: rapidsai/cuvs

Length of output: 104


🏁 Script executed:

rg -n "limit-tests" --type bash

Repository: rapidsai/cuvs

Length of output: 86


🏁 Script executed:

wc -l ./build.sh

Repository: rapidsai/cuvs

Length of output: 72


🏁 Script executed:

rg -n "limit-tests" ./build.sh -A 3 -B 3

Repository: rapidsai/cuvs

Length of output: 2170


Quote the --limit-tests value to prevent shell command splitting.

The unquoted semicolon is a shell command separator, causing the example to fail. The second test name will be treated as a separate command instead of part of the parameter value.

Proposed fix
-./build.sh libcuvs tests -n --limit-tests=NEIGHBORS_TEST;CAGRA_C_TEST
+./build.sh libcuvs tests -n --limit-tests="NEIGHBORS_TEST;CAGRA_C_TEST"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
./build.sh libcuvs tests -n --limit-tests=NEIGHBORS_TEST;CAGRA_C_TEST
```
./build.sh libcuvs tests -n --limit-tests="NEIGHBORS_TEST;CAGRA_C_TEST"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/build.md` around lines 162 - 163, The example command uses an
unquoted semicolon in the --limit-tests argument which causes shell splitting;
update the example (the ./build.sh invocation and the --limit-tests argument) to
quote the value (e.g., --limit-tests="NEIGHBORS_TEST;CAGRA_C_TEST") or escape
the semicolon so the entire string is passed as a single parameter.


The bruteforce method is running the KNN algorithm. It performs an extensive search, and in contrast to ANN methods produces an exact result.

`#include <cuvs/neighbors/bruteforce.h>`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix incorrect C header include path.

Line 5 uses #include <cuvs/neighbors/bruteforce.h>, but the C API header is cuvs/neighbors/brute_force.h. This will break copied examples.

Suggested fix
-`#include <cuvs/neighbors/bruteforce.h>`
+`#include <cuvs/neighbors/brute_force.h>`

As per coding guidelines, "docs/**/*: For documentation changes, focus on: Accuracy: Verify code examples compile and run correctly."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
`#include <cuvs/neighbors/bruteforce.h>`
`#include <cuvs/neighbors/brute_force.h>`
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/c_api/neighbors_bruteforce_c.md` at line 5, The docs example uses
the wrong C header include; replace the incorrect include directive `#include
<cuvs/neighbors/bruteforce.h>` with the correct header `#include
<cuvs/neighbors/brute_force.h>` so examples compile—update the include line in
the file to reference the proper `brute_force.h` header.

Comment on lines +75 to +95
```cpp
#include <raft/core/resources.hpp>
#include <cuvs/preprocessing/spectral_embedding.hpp>

raft::resources handle;

// Configure parameters (n_neighbors is ignored with precomputed graph)
cuvs::preprocessing::spectral_embedding::params params;
params.n_components = 3;
params.norm_laplacian = true;
params.drop_first = true;
params.seed = 42;

// Assume we have a precomputed connectivity graph
// This could be from custom similarity computation or k-NN search
raft::device_coo_matrix<float, int, int, int> connectivity_graph(...);

// Allocate output embedding
auto embedding = raft::make_device_matrix<float, int, raft::col_major>(
handle, n_samples, params.n_components);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix non-compiling C++ example (undefined symbol and likely missing include).

n_samples is used in Line 94 but never declared in the “Using Precomputed Graph” snippet, so the example won’t compile as written. Also, both examples use raft::make_device_matrix/raft::col_major without including the RAFT mdarray header that typically defines them.

Proposed doc fix
 ### Basic Usage with Dataset

 ```cpp
 `#include` <raft/core/resources.hpp>
+#include <raft/core/device_mdarray.hpp>
 `#include` <cuvs/preprocessing/spectral_embedding.hpp>
@@
 ### Using Precomputed Graph

 ```cpp
 `#include` <raft/core/resources.hpp>
+#include <raft/core/device_mdarray.hpp>
 `#include` <cuvs/preprocessing/spectral_embedding.hpp>

 raft::resources handle;
+int n_samples = 1000;

As per coding guidelines, "Accuracy: Verify code examples compile and run correctly".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cpp_api/preprocessing_spectral_embedding.md` around lines 75 -
95, The snippet fails to compile because n_samples is never defined and RAFT
mdarray symbols used (raft::make_device_matrix, raft::col_major) require the
device_mdarray header; fix by adding the include for
raft/core/device_mdarray.hpp and declaring a sample count (e.g., add a
declaration for n_samples such as int n_samples = 1000) before creating the
embedding, ensuring the example includes both raft::core::resources and
raft::core::device_mdarray.hpp and then calls raft::make_device_matrix(handle,
n_samples, params.n_components).

=================
## Using the dataset

After the dataset is downloaded and extracted to the `wiki_all_88M` directory (or `wiki_all_1M`/`wiki_all_10M` depending on whether the subsets are used), the files can be used in the benchmarking tool. The dataset name is `wiki_all` (or `wiki_all_1M`/`wiki_all_10M`), and the benchmarking tool can be used by specifying the appropriate name `--dataset wiki_all_88M` in the scripts.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Unify dataset-name guidance to avoid command misuse.

This line mixes wiki_all naming with --dataset wiki_all_88M, which can confuse users about the actual CLI value. Please state one canonical mapping (dataset id vs extracted directory name) explicitly.

As per coding guidelines, documentation changes should prioritize accuracy and consistency.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cuvs_bench/wiki_all_dataset.md` at line 45, The documentation
mixes the logical dataset id and the extracted directory name which can confuse
CLI usage; update the text to state a single canonical mapping: the --dataset
CLI value is the dataset id (wiki_all, wiki_all_1M, wiki_all_10M) and must match
the extracted directory name (wiki_all_88M for the full dump, or wiki_all_1M /
wiki_all_10M for subsets), and give one clear example using the exact CLI token
--dataset wiki_all_88M so readers know which name to pass to the benchmarking
tool; edit the sentence mentioning dataset name and CLI flag to reflect this
single, consistent mapping.

Comment on lines +55 to +56
cuvsCagraIndexDestroy(index);
cuvsCagraIndexParamsDestroy(index_params);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Destroying wrong variables.

The code creates search_params (line 51) but attempts to destroy index_params (line 56), which was never created in this example. This will cause runtime errors or memory leaks.

🐛 Proposed fix
 cuvsCagraIndexDestroy(index);
-cuvsCagraIndexParamsDestroy(index_params);
+cuvsCagraSearchParamsDestroy(search_params);
 cuvsResourcesDestroy(res);

As per coding guidelines: "Accuracy: Verify code examples compile and run correctly".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/working_with_ann_indexes_c.md` around lines 55 - 56, The example
destroys the wrong variable: it calls cuvsCagraIndexParamsDestroy(index_params)
though the code created search_params (and never created index_params); change
the teardown to destroy the actual variable name used by the example by
replacing the call to cuvsCagraIndexParamsDestroy(index_params) with
cuvsCagraIndexParamsDestroy(search_params) (and keep
cuvsCagraIndexDestroy(index) as-is), ensuring the created search_params is the
one freed.

Comment on lines +13 to +39
raft::device_matrix_view<float> dataset = load_dataset();
raft::device_resources res;

cagra::index_params index_params;

auto index = cagra::build(res, index_params, dataset);
```

## Searching an index

```c++
#include <cuvs/neighbors/cagra.hpp>

using namespace cuvs::neighbors;
cagra::index index;

// ... build index ...

raft::device_matrix_view<float> queries = load_queries();
raft::device_matrix_view<uint32_t> neighbors = make_device_matrix_view<uint32_t>(n_queries, k);
raft::device_matrix_view<float> distances = make_device_matrix_view<float>(n_queries, k);
raft::device_resources res;

cagra::search_params search_params;

cagra::search(res, search_params, index, queries, neighbors, distances);
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how similar C++ ANN docs present runnable vs pseudocode samples.
rg -n -C2 'load_dataset|load_queries|make_device_matrix_view|n_queries|\\bk\\b' docs/source --iglob '*working_with_ann_indexes*.md'
rg -n -C2 'cagra::build\\(|cagra::search\\(' docs/source --iglob '*.md'

Repository: rapidsai/cuvs

Length of output: 3434


🏁 Script executed:

#!/bin/bash
# Get full content of the C++ working with ANN indexes doc
echo "=== C++ Documentation (full context) ==="
cat -n docs/source/working_with_ann_indexes_cpp.md

echo -e "\n=== Checking doc headers/intro for pseudocode markers ==="
head -50 docs/source/working_with_ann_indexes_cpp.md

Repository: rapidsai/cuvs

Length of output: 2313


🏁 Script executed:

#!/bin/bash
# Check how Python and Rust docs handle similar scenarios
echo "=== Python doc around load_queries ==="
sed -n '15,30p' docs/source/working_with_ann_indexes_python.md

echo -e "\n=== Rust doc showing n_queries/k definitions ==="
sed -n '25,45p' docs/source/working_with_ann_indexes_rust.md

echo -e "\n=== C doc handling ==="
sed -n '10,25p' docs/source/working_with_ann_indexes_c.md

Repository: rapidsai/cuvs

Length of output: 1664


🏁 Script executed:

#!/bin/bash
# Search for make_device_matrix_view in the codebase (to verify if real API)
rg 'make_device_matrix_view' --type cpp --type hpp

Repository: rapidsai/cuvs

Length of output: 85


🏁 Script executed:

#!/bin/bash
# Search for make_device_matrix_view across all file types
rg 'make_device_matrix_view' -t c -t cpp --iglob '*'

Repository: rapidsai/cuvs

Length of output: 50369


🏁 Script executed:

#!/bin/bash
# Also check in header/implementation files differently
fd 'make_device_matrix_view' 
rg -l 'def make_device_matrix_view|make_device_matrix_view.*=' --type cpp --type c

Repository: rapidsai/cuvs

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Look for working C++ examples to understand the proper pattern
head -50 examples/cpp/src/cagra_persistent_example.cu | grep -A 5 "make_device_matrix_view"

Repository: rapidsai/cuvs

Length of output: 349


🏁 Script executed:

#!/bin/bash
# Check what the doc file should look like by comparing with examples
rg -A 5 'load_dataset|load_queries' examples/cpp/src --type cpp

Repository: rapidsai/cuvs

Length of output: 39


Add missing definitions or mark code snippet as pseudocode

The C++ example code cannot compile or run as-is. It uses undefined placeholders (load_dataset(), load_queries(), n_queries, k) and has a namespace qualification issue with make_device_matrix_view() (should be raft::make_device_matrix_view()).

The Rust documentation for the same workflow includes actual values (n_queries = 4, k = 10). For consistency with other language examples and to meet the documentation accuracy guideline, either provide complete working code or explicitly label this as pseudocode with explanatory comments.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/working_with_ann_indexes_cpp.md` around lines 13 - 39, The
snippet uses undefined placeholders and a wrong symbol: replace or define
load_dataset(), load_queries(), n_queries and k with concrete example values
(e.g., set n_queries and k as integers and implement simple dataset/query
creation) and call raft::make_device_matrix_view() instead of
make_device_matrix_view(); ensure types match raft::device_matrix_view<float>
and that cagra::index, cagra::index_params, cagra::search_params are properly
constructed before calling cagra::build and cagra::search — alternatively,
explicitly mark the block as pseudocode with a comment stating the placeholders
(load_dataset(), load_queries(), n_queries, k) are illustrative and the correct
symbol is raft::make_device_matrix_view().

Comment on lines +12 to +15
index_params = cagra.IndexParams()

index = cagra.build(build_params, dataset)
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -n 'index_params|build_params|cagra\.build\(' docs/source/working_with_ann_indexes_python.md

Repository: rapidsai/cuvs

Length of output: 141


🏁 Script executed:

cat -n docs/source/working_with_ann_indexes_python.md | sed -n '10,30p'

Repository: rapidsai/cuvs

Length of output: 551


Fix variable name mismatch and syntax errors in code examples.

The first snippet defines index_params but passes undefined build_params to cagra.build(). The second snippet has C++ syntax (// ...) instead of a valid Python comment and uses undefined variable k.

Proposed fixes
 index_params = cagra.IndexParams()
 
-index = cagra.build(build_params, dataset)
+index = cagra.build(index_params, dataset)
 search_params = cagra.SearchParams()
 
-index = // ... build index ...
+index = cagra.build(...)  # Load or build index
 
-neighbors, distances = cagra.search(search_params, index, queries, k)
+neighbors, distances = cagra.search(search_params, index, queries, k=10)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
index_params = cagra.IndexParams()
index = cagra.build(build_params, dataset)
```
index_params = cagra.IndexParams()
index = cagra.build(index_params, dataset)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/working_with_ann_indexes_python.md` around lines 12 - 15, The
code examples use undefined names and wrong comment syntax: replace the call to
cagra.build(build_params, dataset) with cagra.build(index_params, dataset) so
the declared IndexParams instance is passed; change any C++ comments like "//
..." to Python "# ..."; and ensure variable k is defined (e.g., set k =
<desired_int> before use) or replace k with a concrete integer in the example so
no undefined variable is referenced.

Comment on lines +26 to +29
index = // ... build index ...

neighbors, distances = cagra.search(search_params, index, queries, k)
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -n '//|cagra\.search\(' docs/source/working_with_ann_indexes_python.md

Repository: rapidsai/cuvs

Length of output: 164


🏁 Script executed:

# Check the file context around lines 26-29 to understand the example scope
sed -n '20,35p' docs/source/working_with_ann_indexes_python.md

Repository: rapidsai/cuvs

Length of output: 264


🏁 Script executed:

# Check if there are other similar placeholder patterns in the file
rg -n '^\s*(//|\.\.\.|\.\.\.)' docs/source/working_with_ann_indexes_python.md

Repository: rapidsai/cuvs

Length of output: 39


🏁 Script executed:

# Search for other examples of cagra usage in the file to understand proper API patterns
rg -n 'cagra\.(build|search|IndexParams)' docs/source/working_with_ann_indexes_python.md -A 2 -B 2

Repository: rapidsai/cuvs

Length of output: 312


🏁 Script executed:

# Check if there's a complete example elsewhere in the file
grep -n "^import\|^from\|^# \|^queries\|^index\|^k =\|^neighbors" docs/source/working_with_ann_indexes_python.md | head -30

Repository: rapidsai/cuvs

Length of output: 386


🏁 Script executed:

# Search for cagra API usage patterns and documentation to validate the proposed fix
rg -n 'cagra\.build|cagra\.search' --type py -B 2 -A 2 | head -50

Repository: rapidsai/cuvs

Length of output: 3529


🏁 Script executed:

# Check for any Python test files or examples that show proper cagra usage
find . -type f -name "*.py" -path "*/test*" -o -name "*example*.py" | xargs grep -l "cagra\." 2>/dev/null | head -5

Repository: rapidsai/cuvs

Length of output: 219


🏁 Script executed:

# Check the actual cagra API signature and test patterns
sed -n '70,100p' python/cuvs/cuvs/tests/test_cagra.py

Repository: rapidsai/cuvs

Length of output: 1063


🏁 Script executed:

# Look at more complete examples in test_cagra.py
grep -n "def test\|cagra\.build\|cagra\.search" python/cuvs/cuvs/tests/test_cagra.py | head -20

Repository: rapidsai/cuvs

Length of output: 664


Fix invalid Python placeholder syntax and define k in the search example.

The // ... build index ... placeholder uses C/C++ comment syntax instead of valid Python (#), and the variable k is used without being defined, preventing the code example from running.

Suggested fix
-index = // ... build index ...
+# Build or load an index first
+index = cagra.build(cagra.IndexParams(), dataset)
+k = 10
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/working_with_ann_indexes_python.md` around lines 26 - 29, Replace
the invalid C/C++ comment placeholder with a proper Python comment and define
the search count variable before use: change the placeholder for building the
index to a Python-style comment (e.g., using #) where `index` is constructed,
and add a definition for `k` (an integer) prior to calling
`cagra.search(search_params, index, queries, k)` so `k` is defined when passed
to `cagra.search`; ensure references to `index`, `queries`, `k`, and the
`cagra.search` call are updated accordingly.

Comment on lines +8 to +47
use cuvs::cagra::{Index, IndexParams};
use cuvs::{Resources, Result};

use ndarray_rand::rand_distr::Uniform;
use ndarray_rand::RandomExt;

/// Example showing how to index and search data with CAGRA
fn cagra_example() -> Result<()> {
let res = Resources::new()?;

// Create a new random dataset to index
let n_datapoints = 65536;
let n_features = 512;
let dataset =
ndarray::Array::<f32, _>::random((n_datapoints, n_features), Uniform::new(0., 1.0));

// build the cagra index
let build_params = IndexParams::new()?;
let index = Index::build(&res, &build_params, &dataset)?;

// use the first 4 points from the dataset as queries : will test that we get them back
// as their own nearest neighbor
let n_queries = 4;
let queries = dataset.slice(s![0..n_queries, ..]);

let k = 10;

// CAGRA search API requires queries and outputs to be on device memory
// copy query data over, and allocate new device memory for the distances/ neighbors
// outputs
let queries = ManagedTensor::from(&queries).to_device(&res)?;
let mut neighbors_host = ndarray::Array::<u32, _>::zeros((n_queries, k));
let neighbors = ManagedTensor::from(&neighbors_host).to_device(&res)?;

let mut distances_host = ndarray::Array::<f32, _>::zeros((n_queries, k));
let distances = ManagedTensor::from(&distances_host).to_device(&res)?;

let search_params = SearchParams::new()?;

index.search(&res, &search_params, &queries, &neighbors, &distances)?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify how these symbols are imported in existing Rust examples/docs.
rg -n -C2 'ManagedTensor|SearchParams|ndarray::s|s!\[' docs --iglob '*.md'
rg -n -C2 'ManagedTensor|SearchParams|ndarray::s|s!\[' cpp python rust --iglob '*.rs' 2>/dev/null || true

Repository: rapidsai/cuvs

Length of output: 50371


Add missing imports to Rust CAGRA example

The code snippet uses SearchParams, ManagedTensor, and s! without imports. The suggested imports are correct and necessary for the example to compile.

Suggested doc fix
-use cuvs::cagra::{Index, IndexParams};
-use cuvs::{Resources, Result};
+use cuvs::cagra::{Index, IndexParams, SearchParams};
+use cuvs::{ManagedTensor, Resources, Result};
+use ndarray::s;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/working_with_ann_indexes_rust.md` around lines 8 - 47, The
example is missing imports for SearchParams, ManagedTensor, and the s! macro
used in cagra_example; add the appropriate use statements (e.g., bring
SearchParams and ManagedTensor from cuvs::cagra or cuvs as applicable, and
import s! from ndarray::s or ndarray::SliceInfo macro) so that the symbols
SearchParams, ManagedTensor, and s! resolve and the example compiles.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (18)
docs/source/index.md-63-63 (1)

63-63: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Hyphenate compound modifier for readability.

On Line 63, use “high-performance machine learning primitives” instead of “high performance machine learning primitives”.

Suggested wording fix
-cuVS is built on top of the RAPIDS RAFT library of high performance machine learning primitives and provides all the necessary routines for vector search and clustering on the GPU.
+cuVS is built on top of the RAPIDS RAFT library of high-performance machine learning primitives and provides all the necessary routines for vector search and clustering on the GPU.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/index.md` at line 63, Replace the phrase "high performance
machine learning primitives" with the hyphenated compound modifier
"high-performance machine learning primitives" in the sentence that reads "cuVS
is built on top of the RAPIDS RAFT library of high performance machine learning
primitives and provides all the necessary routines for vector search and
clustering on the GPU." to improve readability.
docs/source/index.md-7-7 (1)

7-7: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove unused Markdown reference definition.

Line 7 defines cuvs_reference, but it isn’t referenced anywhere in this page. This triggers markdownlint MD053 and adds dead content.

Suggested cleanup
-[cuvs_reference]: https://docs.rapids.ai/api/cuvs/stable/
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/index.md` at line 7, Remove the unused markdown reference
definition "[cuvs_reference]: https://docs.rapids.ai/api/cuvs/stable/" from the
file (the symbol cuvs_reference) since it is not referenced anywhere on the
page; simply delete that reference line to eliminate the dead content and
satisfy markdownlint MD053.
docs/source/cuvs_bench/param_tuning.md-421-421 (1)

421-421: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep parameter name formatting consistent in table rows.

nn_descent_termination_threshold is the only parameter in this table not wrapped in backticks, which breaks visual consistency and makes scanning/copying harder.

As per coding guidelines, documentation under docs/**/* should maintain consistency in terminology and presentation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cuvs_bench/param_tuning.md` at line 421, The table row contains
nn_descent_termination_threshold without inline code formatting; update that
table entry to wrap nn_descent_termination_threshold in backticks so it matches
the other parameter names and preserves consistent formatting in the docs (look
for the table row containing the text "nn_descent_termination_threshold" and
replace it with the backticked form `nn_descent_termination_threshold`).
docs/source/tuning_guide.md-43-43 (1)

43-43: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Line 43 is missing a verb and changes the step meaning.

Optionally, the above steps multiple times... should explicitly say to repeat the steps.

✏️ Suggested edit
-1. Optionally, the above steps multiple times on different uniform sub-samplings. Optimal parameters can then be combined over the multiple monte-carlo optimization iterations. For example, many hyper-parameters can simply be averaged but care might need to be taken for other parameters.
+1. Optionally, repeat the above steps multiple times on different uniform sub-samplings. Optimal parameters can then be combined over multiple Monte Carlo optimization iterations. For example, many hyper-parameters can be averaged, but care might be needed for others.

As per coding guidelines: "Clarity: Flag confusing explanations, missing prerequisites, or unclear examples".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/tuning_guide.md` at line 43, The sentence "Optionally, the above
steps multiple times on different uniform sub-samplings." is missing a verb and
should explicitly instruct repeating; update that sentence (the one starting
"Optionally, the above steps...") to a clear imperative such as "Optionally,
repeat the above steps multiple times on different uniform sub-samplings..." and
keep the rest of the sentence about combining optimal parameters over
Monte‑Carlo iterations unchanged.
docs/source/tuning_guide.md-13-13 (1)

13-13: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the broken sentence after the Optuna link.

The sentence ends with ). to verify this., which reads as an accidental fragment and makes the recommendation unclear.

✏️ Suggested edit
-Since vector search indexes are more closely related to machine learning models than traditional databases indexes, one option for easing the parameter tuning burden is to use hyper-parameter optimization tools like [Ray Tune](https://medium.com/rapids-ai/30x-faster-hyperparameter-search-with-raytune-and-rapids-403013fbefc5) and [Optuna](https://docs.rapids.ai/deployment/stable/examples/rapids-optuna-hpo/notebook/). to verify this.
+Since vector search indexes are more closely related to machine learning models than traditional database indexes, one option for easing the parameter-tuning burden is to use hyper-parameter optimization tools like [Ray Tune](https://medium.com/rapids-ai/30x-faster-hyperparameter-search-with-raytune-and-rapids-403013fbefc5) and [Optuna](https://docs.rapids.ai/deployment/stable/examples/rapids-optuna-hpo/notebook/).

As per coding guidelines: "Clarity: Flag confusing explanations, missing prerequisites, or unclear examples".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/tuning_guide.md` at line 13, Fix the broken sentence in
tuning_guide.md after the Optuna link: remove the stray period and the fragment
"to verify this" and rephrase the sentence so it reads smoothly (e.g., "use
hyper-parameter optimization tools like Ray Tune and Optuna to automate
parameter tuning" or similar), ensuring the sentence follows the Optuna link and
completes the recommendation clearly.
docs/source/neighbors/ivfflat.md-20-20 (1)

20-20: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix spelling/wording errors in technical descriptions.

These typos reduce clarity and can confuse behavior details (probed vs proved).

✍️ Proposed doc fixes
- IVF methods only apply filters to the lists which are probed for each query point. As a result, the results of a filtered query will likely differ significantly from the results of a filtering applid to an exact method like brute-force. For example. imagine you have 3 IVF lists each containing 2 vectors and you perform a query against only the closest 2 lists but you filter out all but 1 element. If that remaining element happens to be in one of the lists which was not proved, it will not be considered at all in the search results. It's important to consider this when using any of the IVF methods in your applications.
+ IVF methods only apply filters to the lists that are probed for each query point. As a result, a filtered query will likely differ significantly from applying the same filter to an exact method like brute-force. For example, imagine you have 3 IVF lists with 2 vectors each, and you query only the closest 2 lists while filtering out all but 1 element. If that remaining element is in a list that was not probed, it will not be considered in the search results. It's important to consider this when using IVF methods in your applications.
@@
-  - To support dynamic indexes, where points are expected to be added later, the individual IVF lists can be imtentionally overallocated up front to reduce the amount and impact of increasing list sizes, which requires allocating more memory and copying the old list to the new, larger, list.
+  - To support dynamic indexes, where points are expected to be added later, the individual IVF lists can be intentionally overallocated up front to reduce the frequency and impact of list growth, which otherwise requires allocating more memory and copying the old list into a new, larger list.
@@
- Note that each cluster is allocated as a separate allocation. If we use a `cuda_memory_resource`, that would grab memory in 1 MiB chunks, so on average we might have 0.5 MiB overhead per cluster. If we us 10s of thousands of clusters, it becomes essential to use pool allocator to avoid this overhead.
+ Note that each cluster is allocated separately. If we use a `cuda_memory_resource`, it grabs memory in 1 MiB chunks, so on average we might have 0.5 MiB overhead per cluster. If we use tens of thousands of clusters, it becomes essential to use a pool allocator to avoid this overhead.

As per coding guidelines, for docs/**/* changes, prioritize Clarity and Accuracy.

Also applies to: 51-51, 85-85

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/neighbors/ivfflat.md` at line 20, The paragraph in
docs/source/neighbors/ivfflat.md contains spelling and wording errors that
change meaning (e.g., "proved" vs "probed", "applid" vs "applied", "proved" vs
"probed", "proved" vs "probed", "proved" vs "probed", "proved" vs "probed");
edit the text describing IVF methods to correct typos and improve clarity:
replace "applid" with "applied", change every incorrect "proved" to "probed",
fix punctuation/grammar (e.g., "For example. imagine" → "For example, imagine"),
and reword the sentence about lists and filtering so it clearly states that if a
vector is in an unprobed list it won't be considered; apply the same corrections
to the other occurrences noted (lines 51 and 85) to keep documentation
consistent.
docs/source/comparing_indexes.md-5-5 (1)

5-5: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Tighten user-facing wording for clarity and consistency.

There are a few small copy issues in changed text: “overview methodology” phrasing, missing hyphen in “well-tuned”, double space before “hyperparameter”, and “sub sample” consistency with “subsampling”.

✏️ Suggested doc copy edits
-This document provides a brief overview methodology for comparing vector search indexes and models. For guidance on how to choose and configure an index type, please refer to [this](vector_databases_vs_vector_search.md) guide.
+This document provides a brief overview of a methodology for comparing vector search indexes and models. For guidance on how to choose and configure an index type, please refer to [this](vector_databases_vs_vector_search.md) guide.

-For this reason, it’s important to consider the parameters that an index is built upon, both for its potential quality and throughput/latency, when comparing two trained indexes. While easier to build an index on its default parameters than having to tune them, a well tuned index can have a significantly better search quality AND perform within search perf constraints like maximal throughput and minimal latency.
+For this reason, it’s important to consider the parameters that an index is built upon, both for its potential quality and throughput/latency, when comparing two trained indexes. While it is easier to build an index with its default parameters than to tune them, a well-tuned index can have significantly better search quality and still satisfy search performance constraints like maximal throughput and minimal latency.

-The resulting data points will construct a curve known as a Pareto optimum. Please note that this process is specifically for showing best-case across recall and throughput/latency, but when we care about finding the parameters that yield the best recall and search performance, we are essentially performing a  hyperparameter optimization, which is common in machine learning.
+The resulting data points will construct a curve known as a Pareto optimum. Please note that this process is specifically for showing best-case performance across recall and throughput/latency, but when we care about finding parameters that yield the best recall and search performance, we are essentially performing a hyperparameter optimization, which is common in machine learning.

-Please note, however, that there are often caps on the size of each of these smaller indexes, and that needs to be taken into consideration when choosing the size of the sub sample to tune.
+Please note, however, that there are often caps on the size of each of these smaller indexes, and that must be considered when choosing the size of the subsample to tune.

As per coding guidelines: “For documentation changes, focus on … Clarity … [and] Consistency.”

Also applies to: 9-9, 46-46, 53-53

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/comparing_indexes.md` at line 5, Replace awkward phrasing and fix
minor copy issues: change "overview methodology" to "overview of the
methodology" (or "methodology overview") in the sentence containing that phrase,
add the missing hyphen in "well-tuned", remove the double space before
"hyperparameter", and normalize "sub sample" to "subsampling" consistently;
apply these same wording fixes to the other occurrences noted in the file (the
lines containing the phrases "overview methodology", "well tuned"/"well-tuned",
the double-space before "hyperparameter", and "sub sample") so all instances are
consistent.
docs/source/neighbors/vamana.md-17-17 (1)

17-17: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix spelling error.

"repositoriy" should be "repository"

📝 Proposed fix
-The 'vamana::serialize' API calls writes the index to a file with a format that is compatible with the [open-source DiskANN repositoriy](https://github.com/microsoft/DiskANN). This allows cuVS to be used to accelerate index construction while leveraging the efficient CPU-based search currently available.
+The 'vamana::serialize' API calls writes the index to a file with a format that is compatible with the [open-source DiskANN repository](https://github.com/microsoft/DiskANN). This allows cuVS to be used to accelerate index construction while leveraging the efficient CPU-based search currently available.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/neighbors/vamana.md` at line 17, Fix the spelling mistake in the
docs text referring to the DiskANN repository: update the word "repositoriy" to
"repository" in the sentence that mentions the 'vamana::serialize' API so the
sentence reads "...compatible with the open-source DiskANN repository..." and
retains the rest of the wording unchanged.
docs/source/neighbors/vamana.md-32-32 (1)

32-32: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix spelling error in parameter description.

"degre" should be "degree"

📝 Proposed fix
-  - The maximum degre of the final Vamana graph. The internal representation of the graph includes this many edges for every node, but serialize will compress the graph into a 'CSR' format with, potentially, fewer edges.
+  - The maximum degree of the final Vamana graph. The internal representation of the graph includes this many edges for every node, but serialize will compress the graph into a 'CSR' format with, potentially, fewer edges.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/neighbors/vamana.md` at line 32, Fix the spelling in the Vamana
parameter description: locate the sentence "The maximum degre of the final
Vamana graph." in the docs (neighbors/vamana.md) and change "degre" to "degree"
so it reads "The maximum degree of the final Vamana graph." Ensure any identical
occurrences in the same paragraph are updated too.
docs/source/neighbors/vamana.md-38-38 (1)

38-38: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix verb tense in parameter description.

"ran" should be "run" for grammatical correctness.

📝 Proposed fix
-  - Number of iterations ran to improve the graph. Each iteration involves inserting every vector in the dataset.
+  - Number of iterations run to improve the graph. Each iteration involves inserting every vector in the dataset.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/neighbors/vamana.md` at line 38, Update the parameter description
sentence that currently reads "Number of iterations ran to improve the graph.
Each iteration involves inserting every vector in the dataset." to use the
correct verb tense by replacing "ran" with "run" so it reads "Number of
iterations run to improve the graph..." — locate that exact sentence in
docs/source/neighbors/vamana.md and make the one-word change.
docs/source/neighbors/vamana.md-3-3 (1)

3-3: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix spelling errors in introduction.

Two typos need correction:

  • "accelreate" → "accelerate"
  • "idnexes" → "indexes"
📝 Proposed fix
-VAMANA is the underlying graph construction algorithm used to construct indexes for the DiskANN vector search solution. DiskANN and the Vamana algorithm are described in detail in the [published paper](https://papers.nips.cc/paper/9527-rand-nsg-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node.pdf), and a highly optimized [open-source repository](https://github.com/microsoft/DiskANN) includes many features for index construction and search. In cuVS, we provide a version of the Vamana algorithm optimized for GPU architectures to accelreate graph construction to build DiskANN idnexes. At a high level, the Vamana algorithm operates as follows:
+VAMANA is the underlying graph construction algorithm used to construct indexes for the DiskANN vector search solution. DiskANN and the Vamana algorithm are described in detail in the [published paper](https://papers.nips.cc/paper/9527-rand-nsg-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node.pdf), and a highly optimized [open-source repository](https://github.com/microsoft/DiskANN) includes many features for index construction and search. In cuVS, we provide a version of the Vamana algorithm optimized for GPU architectures to accelerate graph construction to build DiskANN indexes. At a high level, the Vamana algorithm operates as follows:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/neighbors/vamana.md` at line 3, Fix two typos in the VAMANA
introduction sentence: change "accelreate" to "accelerate" and "idnexes" to
"indexes" in the paragraph that starts "VAMANA is the underlying graph
construction algorithm..." so the sentence reads that cuVS provides a version
optimized for GPU architectures to accelerate graph construction to build
DiskANN indexes.
docs/source/cuvs_bench/index.md-106-116 (1)

106-116: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use a consistent dataset identifier across the step-by-step example.

Step (1) downloads deep-image-96-angular, but step (2) benchmarks deep-image-96-inner. If this switch is intentional, call it out explicitly; otherwise this breaks reproducibility of the example.

As per coding guidelines, documentation examples should be accurate and runnable end-to-end.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cuvs_bench/index.md` around lines 106 - 116, The example uses two
different dataset IDs between step (1) and step (2) which breaks
reproducibility; update the call to BenchmarkOrchestrator.run_benchmark (or the
earlier python -m cuvs_bench.get_dataset step) so both use the same identifier
(e.g., change the run_benchmark dataset argument from "deep-image-96-inner" to
"deep-image-96-angular") or add an explicit comment explaining the intentional
switch; locate the dataset string in the orchestrator instantiation and the
python -m invocation to make the change.
docs/source/cuvs_bench/index.md-242-242 (1)

242-242: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid hardcoding an outdated branch in the dataset descriptor link.

The link points to branch-25.04, which can become stale relative to this doc version and confuse users about the canonical config source.

Suggested edit
-Descriptors for several popular datasets are already available in [datasets.yaml](https://github.com/rapidsai/cuvs/blob/branch-25.04/python/cuvs_bench/cuvs_bench/config/datasets/datasets.yaml).
+Descriptors for several popular datasets are already available in [datasets.yaml](https://github.com/rapidsai/cuvs/blob/main/python/cuvs_bench/cuvs_bench/config/datasets/datasets.yaml).

As per coding guidelines, documentation changes should maintain consistency of versioned references.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cuvs_bench/index.md` at line 242, The docs line currently
hardcodes the branch name "branch-25.04" in the dataset descriptor link; update
that URL to a branch-agnostic or canonical reference (for example replace
"branch-25.04" with "main" or use the repository's canonical branch) so the link
stays up-to-date; specifically edit the link text in index.md that points to
https://github.com/rapidsai/cuvs/blob/branch-25.04/python/cuvs_bench/cuvs_bench/config/datasets/datasets.yaml
to use the canonical branch name (e.g., "main") or a branchless reference.
docs/source/cuvs_bench/index.md-90-90 (1)

90-90: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix duplicated Python version in the supported matrix.

3.11 is listed twice, which makes the compatibility matrix ambiguous and may hide a missing version entry.

Suggested edit
-- Supported Python versions: 3.11, 3.11, 3.13, and 3.14
+- Supported Python versions: 3.11, 3.12, 3.13, and 3.14

As per coding guidelines, documentation changes should keep version numbers and terminology consistent with actual support.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cuvs_bench/index.md` at line 90, The supported Python versions
line contains a duplicated entry ("Supported Python versions: 3.11, 3.11, 3.13,
and 3.14"); update that line to remove the duplicate and ensure the list
reflects the real supported versions (e.g., replace the duplicate 3.11 with the
correct missing version or remove it so it reads "3.11, 3.13, and 3.14"),
editing the exact string "Supported Python versions: 3.11, 3.11, 3.13, and 3.14"
in the docs to keep the compatibility matrix accurate and consistent.
docs/source/cuvs_bench/index.md-371-371 (1)

371-371: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Correct metric label typo from GU to GPU.

This is user-facing benchmark terminology and should be precise.

Suggested edit
-* - GPU
-  - GU time spent building
+* - GPU
+  - GPU time spent building

As per coding guidelines, documentation should prioritize clarity and accurate terminology.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cuvs_bench/index.md` at line 371, Replace the typo "GU time spent
building" with the correct label "GPU time spent building" in the documentation
(search for the exact string "GU time spent building" in
docs/source/cuvs_bench/index.md) so the user-facing benchmark metric uses
accurate GPU terminology; update any identical occurrences to maintain
consistency.
docs/source/cuvs_bench/index.md-584-584 (1)

584-584: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix malformed inline code span and unclear path reference.

The phrase around `cpp/bench/ann/) and `create_search_param()` is broken and hard to parse, which can mislead contributors updating extension points.

Suggested edit
-2. Next, add corresponding `if` case to functions `create_algo()` (in `cpp/bench/ann/) and `create_search_param()` by calling parsing functions. The string literal in `if` condition statement must be the same as the value of `algo` in configuration file. For example,
+2. Next, add a corresponding `if` case to `create_algo()` and `create_search_param()` (in `cpp/bench/ann/`) by calling the parsing functions. The string literal in the `if` condition must match the `algo` value in the configuration file. For example,

As per coding guidelines, documentation should be clear and consistent for implementation guidance.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cuvs_bench/index.md` at line 584, The inline code span around
create_algo() and create_search_param() is malformed and the location reference
is unclear; fix the sentence to use correct backticks for both symbols (e.g.,
"`create_algo()` and `create_search_param()`") and replace the broken path
fragment with a clear, short location note such as "(in the ann implementation
module)" so the instruction reads cleanly and preserves the requirement that the
string literal in the if-condition must match the algo value in the config.
docs/source/neighbors/ivfpq.md-48-48 (1)

48-48: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix typo in user-facing parameter description (imtentionallyintentionally).

Small wording fix, but this is in a primary configuration table.

Proposed doc fix
-  - To support dynamic indexes, where points are expected to be added later, the individual IVF lists can be imtentionally overallocated up front to reduce the amount and impact of increasing list sizes, which requires allocating more memory and copying the old list to the new, larger, list.
+  - To support dynamic indexes, where points are expected to be added later, the individual IVF lists can be intentionally overallocated up front to reduce the amount and impact of increasing list sizes, which requires allocating more memory and copying the old list to the new, larger, list.
As per coding guidelines, “For documentation changes, focus on: Clarity: Flag confusing explanations, missing prerequisites, or unclear examples”.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/neighbors/ivfpq.md` at line 48, Fix the typo in the IVFPQ docs
sentence: replace the misspelled word "imtentionally" with "intentionally" in
the sentence describing overallocating individual IVF lists for dynamic indexes
(the sentence starting "To support dynamic indexes, where points are expected to
be added later...").
docs/source/getting_started.md-64-76 (1)

64-76: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix visible copy issues in the landing page text.

There are a few user-facing text errors:

  • Line 64: “our [API basics]” → “Our [API basics]”.
  • Line 71: missing punctuation after “Apache 2.0”.
  • Line 71: “Github” → “GitHub”.
  • Line 75: remove extra space before comma after Slack link.

As per coding guidelines, documentation changes should prioritize Clarity and Consistency.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/getting_started.md` around lines 64 - 76, Fix the copy issues in
the getting_started.md landing text: change "our [API basics]" to "Our [API
basics]"; add missing punctuation after "licensed under Apache 2.0" (e.g., a
period); change "Github" to "GitHub"; and remove the extra space before the
comma after the Slack link so the phrase becomes
"[Slack](https://rapids.ai/slack-invite),". Update these specific text
occurrences in the block under the cuVS overview and "Where to next?" headings.
🧹 Nitpick comments (1)
docs/source/filtering.md (1)

5-7: ⚡ Quick win

Tighten the opening paragraph wording for readability.

The intro is a bit awkward/wordy (“different type”, “take into account the filtering…”). A concise rewrite will read more clearly.

As per coding guidelines, documentation changes should prioritize Clarity.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/filtering.md` around lines 5 - 7, The opening paragraph in
filtering.md is wordy and awkward; rewrite it for clarity by removing vague
phrases like "different type" and "take into account the filtering" and
producing a concise, direct intro that names the supported filtering modes and
defines pre-filtering clearly (e.g., "cuVS supports multiple filtering modes
depending on the vector index. The primary method, pre-filtering, applies
filters to vectors before nearest-neighbor search to reduce distance
computations."). Ensure the term "pre-filtering" remains and that the paragraph
succinctly states its purpose and benefit.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/source/cuvs_bench/pluggable_backend.md`:
- Line 43: The docs line for backend_config contains a malformed constructor
reference `config[in](`#in`)_init__`; update the sentence to clearly reference the
instance attribute in the constructor by saying that the backend receives this
as `self.config` in `__init__` (mentioning both `backend_config` and `__init__`
so reviewers can find the text), replacing the broken Markdown with a plain,
accurate phrase like "the backend receives this as self.config in __init__".

In `@docs/source/cuvs_bench/wiki_all_dataset.md`:
- Around line 20-22: Add a prerequisite step to create the target directory
before extraction: insert a mkdir -p wiki_all_88M line immediately before the
tar extraction commands that use -C wiki_all_88M/ (the curl | tar -xf - -C
wiki_all_88M/ lines) so the commands work from a clean workspace and are
reproducible; apply the same mkdir -p insertion to any other occurrences that
extract into wiki_all_88M/.
- Around line 26-31: Replace the per-part curl invocations that stream to stdout
with commands that save each part to disk so the subsequent cat wiki_all.tar.* |
tar -xf - -C wiki_all_88M/ works; specifically update each "curl -s
https://.../wiki_all.tar.0X" to use a saving flag (e.g., "-sO" or "-s -o
wiki_all.tar.0X") for the series of parts (wiki_all.tar.00 through
wiki_all.tar.09) so the files exist on disk before running the final cat/tar
extraction.

In `@docs/source/filtering.md`:
- Around line 38-40: The examples call make_device_matrix_view without the
required data pointer; replace those non-compiling calls by allocating owning
matrices with raft::make_device_matrix (e.g., create raft::device_matrix<T,
index_t> res = raft::make_device_matrix<...>(...)) and then obtain a view via
res.view(); update the variables referenced (neighbors, distances) and all
search function calls to use the .view() of the owning matrices instead of
calling make_device_matrix_view with only dimensions.

In `@docs/source/neighbors/ivfpq.md`:
- Around line 118-123: The peak-memory formula in the ivfpq doc is inverted:
anywhere you divide by trainset_ratio you should instead multiply by it so
memory scales with the sampled training set size; update the three terms
containing trainset_ratio (the term with n_vectors/(trainset_ratio * dims *
sizeof_{float}), the term with n_vectors/(trainset_ratio * sizeof_{uint32_t}),
and ensure n_clusters * dim * sizeof_{float} remains as-is) to use n_vectors *
trainset_ratio * dims * sizeof_{float} and n_vectors * trainset_ratio *
sizeof_{uint32_t} respectively, keeping symbols n_vectors, trainset_ratio,
dims/dim, sizeof_{float}, sizeof_{uint32_t}, and n_clusters to locate the
changes.

---

Minor comments:
In `@docs/source/comparing_indexes.md`:
- Line 5: Replace awkward phrasing and fix minor copy issues: change "overview
methodology" to "overview of the methodology" (or "methodology overview") in the
sentence containing that phrase, add the missing hyphen in "well-tuned", remove
the double space before "hyperparameter", and normalize "sub sample" to
"subsampling" consistently; apply these same wording fixes to the other
occurrences noted in the file (the lines containing the phrases "overview
methodology", "well tuned"/"well-tuned", the double-space before
"hyperparameter", and "sub sample") so all instances are consistent.

In `@docs/source/cuvs_bench/index.md`:
- Around line 106-116: The example uses two different dataset IDs between step
(1) and step (2) which breaks reproducibility; update the call to
BenchmarkOrchestrator.run_benchmark (or the earlier python -m
cuvs_bench.get_dataset step) so both use the same identifier (e.g., change the
run_benchmark dataset argument from "deep-image-96-inner" to
"deep-image-96-angular") or add an explicit comment explaining the intentional
switch; locate the dataset string in the orchestrator instantiation and the
python -m invocation to make the change.
- Line 242: The docs line currently hardcodes the branch name "branch-25.04" in
the dataset descriptor link; update that URL to a branch-agnostic or canonical
reference (for example replace "branch-25.04" with "main" or use the
repository's canonical branch) so the link stays up-to-date; specifically edit
the link text in index.md that points to
https://github.com/rapidsai/cuvs/blob/branch-25.04/python/cuvs_bench/cuvs_bench/config/datasets/datasets.yaml
to use the canonical branch name (e.g., "main") or a branchless reference.
- Line 90: The supported Python versions line contains a duplicated entry
("Supported Python versions: 3.11, 3.11, 3.13, and 3.14"); update that line to
remove the duplicate and ensure the list reflects the real supported versions
(e.g., replace the duplicate 3.11 with the correct missing version or remove it
so it reads "3.11, 3.13, and 3.14"), editing the exact string "Supported Python
versions: 3.11, 3.11, 3.13, and 3.14" in the docs to keep the compatibility
matrix accurate and consistent.
- Line 371: Replace the typo "GU time spent building" with the correct label
"GPU time spent building" in the documentation (search for the exact string "GU
time spent building" in docs/source/cuvs_bench/index.md) so the user-facing
benchmark metric uses accurate GPU terminology; update any identical occurrences
to maintain consistency.
- Line 584: The inline code span around create_algo() and create_search_param()
is malformed and the location reference is unclear; fix the sentence to use
correct backticks for both symbols (e.g., "`create_algo()` and
`create_search_param()`") and replace the broken path fragment with a clear,
short location note such as "(in the ann implementation module)" so the
instruction reads cleanly and preserves the requirement that the string literal
in the if-condition must match the algo value in the config.

In `@docs/source/cuvs_bench/param_tuning.md`:
- Line 421: The table row contains nn_descent_termination_threshold without
inline code formatting; update that table entry to wrap
nn_descent_termination_threshold in backticks so it matches the other parameter
names and preserves consistent formatting in the docs (look for the table row
containing the text "nn_descent_termination_threshold" and replace it with the
backticked form `nn_descent_termination_threshold`).

In `@docs/source/getting_started.md`:
- Around line 64-76: Fix the copy issues in the getting_started.md landing text:
change "our [API basics]" to "Our [API basics]"; add missing punctuation after
"licensed under Apache 2.0" (e.g., a period); change "Github" to "GitHub"; and
remove the extra space before the comma after the Slack link so the phrase
becomes "[Slack](https://rapids.ai/slack-invite),". Update these specific text
occurrences in the block under the cuVS overview and "Where to next?" headings.

In `@docs/source/index.md`:
- Line 63: Replace the phrase "high performance machine learning primitives"
with the hyphenated compound modifier "high-performance machine learning
primitives" in the sentence that reads "cuVS is built on top of the RAPIDS RAFT
library of high performance machine learning primitives and provides all the
necessary routines for vector search and clustering on the GPU." to improve
readability.
- Line 7: Remove the unused markdown reference definition "[cuvs_reference]:
https://docs.rapids.ai/api/cuvs/stable/" from the file (the symbol
cuvs_reference) since it is not referenced anywhere on the page; simply delete
that reference line to eliminate the dead content and satisfy markdownlint
MD053.

In `@docs/source/neighbors/ivfflat.md`:
- Line 20: The paragraph in docs/source/neighbors/ivfflat.md contains spelling
and wording errors that change meaning (e.g., "proved" vs "probed", "applid" vs
"applied", "proved" vs "probed", "proved" vs "probed", "proved" vs "probed",
"proved" vs "probed"); edit the text describing IVF methods to correct typos and
improve clarity: replace "applid" with "applied", change every incorrect
"proved" to "probed", fix punctuation/grammar (e.g., "For example. imagine" →
"For example, imagine"), and reword the sentence about lists and filtering so it
clearly states that if a vector is in an unprobed list it won't be considered;
apply the same corrections to the other occurrences noted (lines 51 and 85) to
keep documentation consistent.

In `@docs/source/neighbors/ivfpq.md`:
- Line 48: Fix the typo in the IVFPQ docs sentence: replace the misspelled word
"imtentionally" with "intentionally" in the sentence describing overallocating
individual IVF lists for dynamic indexes (the sentence starting "To support
dynamic indexes, where points are expected to be added later...").

In `@docs/source/neighbors/vamana.md`:
- Line 17: Fix the spelling mistake in the docs text referring to the DiskANN
repository: update the word "repositoriy" to "repository" in the sentence that
mentions the 'vamana::serialize' API so the sentence reads "...compatible with
the open-source DiskANN repository..." and retains the rest of the wording
unchanged.
- Line 32: Fix the spelling in the Vamana parameter description: locate the
sentence "The maximum degre of the final Vamana graph." in the docs
(neighbors/vamana.md) and change "degre" to "degree" so it reads "The maximum
degree of the final Vamana graph." Ensure any identical occurrences in the same
paragraph are updated too.
- Line 38: Update the parameter description sentence that currently reads
"Number of iterations ran to improve the graph. Each iteration involves
inserting every vector in the dataset." to use the correct verb tense by
replacing "ran" with "run" so it reads "Number of iterations run to improve the
graph..." — locate that exact sentence in docs/source/neighbors/vamana.md and
make the one-word change.
- Line 3: Fix two typos in the VAMANA introduction sentence: change "accelreate"
to "accelerate" and "idnexes" to "indexes" in the paragraph that starts "VAMANA
is the underlying graph construction algorithm..." so the sentence reads that
cuVS provides a version optimized for GPU architectures to accelerate graph
construction to build DiskANN indexes.

In `@docs/source/tuning_guide.md`:
- Line 43: The sentence "Optionally, the above steps multiple times on different
uniform sub-samplings." is missing a verb and should explicitly instruct
repeating; update that sentence (the one starting "Optionally, the above
steps...") to a clear imperative such as "Optionally, repeat the above steps
multiple times on different uniform sub-samplings..." and keep the rest of the
sentence about combining optimal parameters over Monte‑Carlo iterations
unchanged.
- Line 13: Fix the broken sentence in tuning_guide.md after the Optuna link:
remove the stray period and the fragment "to verify this" and rephrase the
sentence so it reads smoothly (e.g., "use hyper-parameter optimization tools
like Ray Tune and Optuna to automate parameter tuning" or similar), ensuring the
sentence follows the Optuna link and completes the recommendation clearly.

---

Nitpick comments:
In `@docs/source/filtering.md`:
- Around line 5-7: The opening paragraph in filtering.md is wordy and awkward;
rewrite it for clarity by removing vague phrases like "different type" and "take
into account the filtering" and producing a concise, direct intro that names the
supported filtering modes and defines pre-filtering clearly (e.g., "cuVS
supports multiple filtering modes depending on the vector index. The primary
method, pre-filtering, applies filters to vectors before nearest-neighbor search
to reduce distance computations."). Ensure the term "pre-filtering" remains and
that the paragraph succinctly states its purpose and benefit.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e59876e2-ff87-4d18-8e35-b8fecd552af2

📥 Commits

Reviewing files that changed from the base of the PR and between 35c698e and e040350.

📒 Files selected for processing (22)
  • CHANGELOG.md
  • docs/source/advanced_topics.md
  • docs/source/api_docs.md
  • docs/source/comparing_indexes.md
  • docs/source/cuvs_bench/build.md
  • docs/source/cuvs_bench/index.md
  • docs/source/cuvs_bench/param_tuning.md
  • docs/source/cuvs_bench/pluggable_backend.md
  • docs/source/cuvs_bench/wiki_all_dataset.md
  • docs/source/developer_guide.md
  • docs/source/filtering.md
  • docs/source/getting_started.md
  • docs/source/index.md
  • docs/source/neighbors/all_neighbors.md
  • docs/source/neighbors/bruteforce.md
  • docs/source/neighbors/cagra.md
  • docs/source/neighbors/ivfflat.md
  • docs/source/neighbors/ivfpq.md
  • docs/source/neighbors/neighbors.md
  • docs/source/neighbors/vamana.md
  • docs/source/tuning_guide.md
  • docs/source/vector_databases_vs_vector_search.md
✅ Files skipped from review due to trivial changes (8)
  • docs/source/neighbors/neighbors.md
  • CHANGELOG.md
  • docs/source/api_docs.md
  • docs/source/neighbors/all_neighbors.md
  • docs/source/advanced_topics.md
  • docs/source/neighbors/bruteforce.md
  • docs/source/cuvs_bench/build.md
  • docs/source/neighbors/cagra.md


- **List[BenchmarkConfig]** – Each **BenchmarkConfig** has:
- **indexes**: a list of **IndexConfig**. Each **IndexConfig** has `name` (e.g. `"my_algo.param1value"`), `algo` (algorithm name), `build_param` (dict of build parameters), `search_params` (list of dicts, one per search parameter combination to benchmark), and `file` (path or identifier where the index is stored).
- **backend_config**: a dict passed to the backend constructor (e.g. `executable_path` for C++, or `host`, `port`, `index_name` for a network backend). The backend receives this as its `config[in](#in)_init__`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix malformed constructor reference (__init__) in backend config description.

config[in](#in)_init__ is broken Markdown and obscures the intended constructor reference. This should explicitly point to self.config in __init__.

Proposed fix
-  - **backend_config**: a dict passed to the backend constructor (e.g. `executable_path` for C++, or `host`, `port`, `index_name` for a network backend). The backend receives this as its `config[in](`#in`)_init__`.
+  - **backend_config**: a dict passed to the backend constructor (e.g. `executable_path` for C++, or `host`, `port`, `index_name` for a network backend). The backend receives this as `self.config` in `__init__`.

As per coding guidelines, documentation under docs/**/* should prioritize clarity and accuracy of API explanations.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- **backend_config**: a dict passed to the backend constructor (e.g. `executable_path` for C++, or `host`, `port`, `index_name` for a network backend). The backend receives this as its `config[in](#in)_init__`.
- **backend_config**: a dict passed to the backend constructor (e.g. `executable_path` for C++, or `host`, `port`, `index_name` for a network backend). The backend receives this as `self.config` in `__init__`.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cuvs_bench/pluggable_backend.md` at line 43, The docs line for
backend_config contains a malformed constructor reference
`config[in](`#in`)_init__`; update the sentence to clearly reference the instance
attribute in the constructor by saying that the backend receives this as
`self.config` in `__init__` (mentioning both `backend_config` and `__init__` so
reviewers can find the text), replacing the broken Markdown with a plain,
accurate phrase like "the backend receives this as self.config in __init__".

Comment on lines +20 to +22
```bash
curl -s https://data.rapids.ai/raft/datasets/wiki_all/wiki_all.tar.{00..9} | tar -xf - -C wiki_all_88M/
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add directory creation before extraction with tar -C

Both extraction commands assume wiki_all_88M/ already exists. Add mkdir -p wiki_all_88M before extraction to make the steps reproducible from a clean workspace.

Proposed doc fix
 ```bash
+mkdir -p wiki_all_88M
 curl -s https://data.rapids.ai/raft/datasets/wiki_all/wiki_all.tar.{00..9} | tar -xf - -C wiki_all_88M/

```diff
-cat wiki_all.tar.* | tar -xf - -C wiki_all_88M/
+mkdir -p wiki_all_88M
+cat wiki_all.tar.* | tar -xf - -C wiki_all_88M/

As per coding guidelines, documentation changes should prioritize Clarity and avoid missing prerequisites.

Also applies to: 31-31

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/cuvs_bench/wiki_all_dataset.md` around lines 20 - 22, Add a
prerequisite step to create the target directory before extraction: insert a
mkdir -p wiki_all_88M line immediately before the tar extraction commands that
use -C wiki_all_88M/ (the curl | tar -xf - -C wiki_all_88M/ lines) so the
commands work from a clean workspace and are reproducible; apply the same mkdir
-p insertion to any other occurrences that extract into wiki_all_88M/.

Comment on lines +26 to +31
```bash
curl -s https://data.rapids.ai/raft/datasets/wiki_all/wiki_all.tar.00
...
curl -s https://data.rapids.ai/raft/datasets/wiki_all/wiki_all.tar.09

cat wiki_all.tar.* | tar -xf - -C wiki_all_88M/
cat wiki_all.tar.* | tar -xf - -C wiki_all_88M/

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Per-part download commands won’t produce files for cat

curl -s <url> writes to stdout, so cat wiki_all.tar.* won’t find files unless outputs are saved. Use -O (or explicit -o) for each part.

Proposed doc fix
 ```bash
-curl -s https://data.rapids.ai/raft/datasets/wiki_all/wiki_all.tar.00
+curl -sO https://data.rapids.ai/raft/datasets/wiki_all/wiki_all.tar.00
 ...
-curl -s https://data.rapids.ai/raft/datasets/wiki_all/wiki_all.tar.09
+curl -sO https://data.rapids.ai/raft/datasets/wiki_all/wiki_all.tar.09
 
 cat wiki_all.tar.* | tar -xf - -C wiki_all_88M/
</details>

As per coding guidelines, documentation changes should prioritize **Accuracy** and examples should run correctly.

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @docs/source/cuvs_bench/wiki_all_dataset.md around lines 26 - 31, Replace the
per-part curl invocations that stream to stdout with commands that save each
part to disk so the subsequent cat wiki_all.tar.* | tar -xf - -C wiki_all_88M/
works; specifically update each "curl -s https://.../wiki_all.tar.0X" to use a
saving flag (e.g., "-sO" or "-s -o wiki_all.tar.0X") for the series of parts
(wiki_all.tar.00 through wiki_all.tar.09) so the files exist on disk before
running the final cat/tar extraction.


</details>

<!-- fingerprinting:phantom:triton:hawk -->

<!-- 4e71b3a2 -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment thread docs/source/filtering.md
Comment on lines +38 to +40
raft::device_matrix_view<uint32_t> neighbors = make_device_matrix_view<uint32_t>(n_queries, k);
raft::device_matrix_view<float> distances = make_device_matrix_view<float>(n_queries, k);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In the RAFT C++ API, what are the exact signatures/usages of raft::make_device_matrix_viewandraft::make_device_matrix? Please include examples showing when a data pointer is required versus when memory is allocated.

💡 Result:

In the RAFT C++ API, raft::make_device_matrix_view creates a non-owning 2D device mdspan (view) from an existing device pointer, requiring the data pointer. Its signature is: template auto constexpr raft::make_device_matrix_view(ElementType *ptr, IndexType n_rows, IndexType n_cols) It creates a c-contiguous (row-major by default) device matrix view. Example [1][2]: #include <raft/core/device_mdspan.hpp> float d_ptr = / allocated device pointer */; auto matrix_view = raft::make_device_matrix_view(d_ptr, 10, 20); // Uses existing d_ptr, no allocation [1] raft::make_device_matrix creates an owning device mdarray (matrix), allocating memory automatically, no data pointer required. Usage (template inferred): auto matrix = raft::make_device_matrix<float>(handle, n_rows, n_cols); Example from quick start [3][4]: #include <raft/core/device_mdarray.hpp> raft::handle_t handle; // or resources int n_rows = 10, n_cols = 10; auto matrix = raft::make_device_matrix<float>(handle, n_rows, n_cols); // Allocates memory [3][4] To get a view from the mdarray: auto matrix_view = matrix.view; or raft::make_device_matrix_view(matrix.data_handle, n_rows, n_cols); [3][4] Data pointer required for view (non-owning, no alloc), not for matrix (owning, allocates).

Citations:


🏁 Script executed:

cat -n docs/source/filtering.md | head -120

Repository: rapidsai/cuvs

Length of output: 5523


Fix matrix view construction in code examples—missing required data pointer argument.

The examples at lines 38–40 and 99–100 call make_device_matrix_view with only shape arguments (n_queries, k), but this function requires a data pointer as the first argument per its signature: make_device_matrix_view(ElementType *ptr, IndexType n_rows, IndexType n_cols). As written, these examples will not compile.

Use owning matrix allocations instead: raft::make_device_matrix<type, index_dtype>(res, n_rows, n_cols), then pass .view() to the search functions.

Proposed doc fix
-raft::device_matrix_view<uint32_t> neighbors = make_device_matrix_view<uint32_t>(n_queries, k);
-raft::device_matrix_view<float> distances = make_device_matrix_view<float>(n_queries, k);
+auto neighbors = raft::make_device_matrix<uint32_t, int64_t>(res, n_queries, k);
+auto distances = raft::make_device_matrix<float, int64_t>(res, n_queries, k);
-auto neighbors = raft::make_device_matrix_view<uint32_t, indexing_dtype>(n_queries, k);
-auto distances = raft::make_device_matrix_view<float, indexing_dtype>(n_queries, k);
+auto neighbors = raft::make_device_matrix<uint32_t, indexing_dtype>(res, n_queries, k);
+auto distances = raft::make_device_matrix<float, indexing_dtype>(res, n_queries, k);

Then update the search calls to pass .view():

-              neighbors,
-              distances,
+              neighbors.view(),
+              distances.view(),

Applies to: lines 38–40, 99–100. Impacts: lines 55–61, 101–107.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/filtering.md` around lines 38 - 40, The examples call
make_device_matrix_view without the required data pointer; replace those
non-compiling calls by allocating owning matrices with raft::make_device_matrix
(e.g., create raft::device_matrix<T, index_t> res =
raft::make_device_matrix<...>(...)) and then obtain a view via res.view();
update the variables referenced (neighbors, distances) and all search function
calls to use the .view() of the owning matrices instead of calling
make_device_matrix_view with only dimensions.

Comment on lines +118 to +123
$$
\frac{n\_vectors}{trainset\_ratio * dims * sizeof_{float}}

+ \frac{n\_vectors}{trainset\_ratio * sizeof_{uint32\_t}}

+ n\_clusters * dim * sizeof_{float}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Build peak-memory formula appears mathematically incorrect.

These terms currently divide by trainset_ratio; for memory usage they should scale with sampled training set size (multiply by ratio), otherwise the estimate is inverted and misleading.

Proposed doc fix
- \frac{n\_vectors}{trainset\_ratio * dims * sizeof_{float}}
+ n\_vectors * trainset\_ratio * dims * sizeof_{float}

- + \frac{n\_vectors}{trainset\_ratio * sizeof_{uint32\_t}}
+ + n\_vectors * trainset\_ratio * sizeof_{uint32\_t}
As per coding guidelines, “For documentation changes, focus on: Accuracy: Verify code examples compile and run correctly” and “Consistency: Version numbers, parameter types, and terminology match code”.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
$$
\frac{n\_vectors}{trainset\_ratio * dims * sizeof_{float}}
+ \frac{n\_vectors}{trainset\_ratio * sizeof_{uint32\_t}}
+ n\_clusters * dim * sizeof_{float}
$$
n\_vectors * trainset\_ratio * dims * sizeof_{float}
n\_vectors * trainset\_ratio * sizeof_{uint32\_t}
n\_clusters * dim * sizeof_{float}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/source/neighbors/ivfpq.md` around lines 118 - 123, The peak-memory
formula in the ivfpq doc is inverted: anywhere you divide by trainset_ratio you
should instead multiply by it so memory scales with the sampled training set
size; update the three terms containing trainset_ratio (the term with
n_vectors/(trainset_ratio * dims * sizeof_{float}), the term with
n_vectors/(trainset_ratio * sizeof_{uint32_t}), and ensure n_clusters * dim *
sizeof_{float} remains as-is) to use n_vectors * trainset_ratio * dims *
sizeof_{float} and n_vectors * trainset_ratio * sizeof_{uint32_t} respectively,
keeping symbols n_vectors, trainset_ratio, dims/dim, sizeof_{float},
sizeof_{uint32_t}, and n_clusters to locate the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Improvements or additions to documentation non-breaking Introduces a non-breaking change

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants