[RELEASE] cuvs v25.08 by AyodeAwe · Pull Request #1205 · NVIDIA/cuvs

AyodeAwe · 2025-07-31T15:25:34Z

❄️ Code freeze for `branch-25.08` and v25.08 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-25.08 until release (merging of this PR).

What is the purpose of this PR?

Update documentation
Allow testing for the new release
Enable a means to merge branch-25.08 into main for the release

Forward-merge branch-25.06 into branch-25.08

Contributes to rapidsai/build-planning#181 * removes all uploads of conda packages and wheels to `downloads.rapids.ai` ## Notes for Reviewers ### How I identified changes Looked for uses of the relevant `gha-tools` tools, as well as documentation about `downloads.rapids.ai`, being on the NVIDIA VPN, using S3, etc. like this: ```shell git grep -i -E 's3|upload|downloads\.rapids|vpn' ``` ### How I tested this See "How I tested this" on rapidsai/shared-workflows#364 Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Gil Forsyth (https://github.com/gforsyth) - Bradley Dice (https://github.com/bdice) URL: #940

Forward-merge branch-25.06 into branch-25.08

This PR removes CUDA 11 devcontainers and updates CI scripts. xref: rapidsai/build-planning#184 Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #960

Issue: rapidsai/build-planning#184 Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - Bradley Dice (https://github.com/bdice) URL: #962

xref rapidsai/build-planning#184 Authors: - Gil Forsyth (https://github.com/gforsyth) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: #961

_Edit by @jakirkham_ - Fixes #1118 Authors: - Corey J. Nolet (https://github.com/cjnolet) - https://github.com/LizYou Approvers: - https://github.com/jakirkham - Ben Frederickson (https://github.com/benfred) URL: #1150

In #902 and #1034 we introduced a `Dataset` interface to support on-heap and off-heap ("native") memory seamlessly as inputs for cagra and bruteforce index building. As we expand the functionality of cuvs-java, we realized we have similar needs for outputs (see e.g. #1105 / #1102 or #1104). This PR extends `Dataset` to support being used as an output, wrapping native (off-heap) memory in a convenient and efficient way, and providing common utilities to transform to and from on-heap memory. This work is inspired by the existing raft `mdspan` and `DLTensor` data structures, but tailored to our needs (2d only, just 3 data types, etc.). The PR keeps the current implementation simple and minimal on purpose, but structured in a way that is simple to extend. By itself, the PR is just a refactoring to extend the `Dataset` implementation and reorganize the implementation classes; its real usefulness will be in using it in the PRs mentioned above (in fact, this PR has been extracted from #1105). The implementation class hierarchy is implemented with future extensions in mind: atm we have one `HostMemoryDatasetImpl`, but we are already thinking to have a corresponding `DeviceMemoryDatasetImpl` that will wrap and manage (views) on GPU memory to avoid (in some cases) extra copies of data from GPU memory to CPU memory only to process them or forward them to another algorithm (e.g quantization followed by indexing). Future work will also include add support/refactoring to allocate and manage GPU memory and DLTensors (e.g. working better with/refactoring `prepareTensor`). Authors: - Lorenzo Dematté (https://github.com/ldematte) - MithunR (https://github.com/mythrocks) Approvers: - MithunR (https://github.com/mythrocks) URL: #1111

Authors: - Tarang Jain (https://github.com/tarang-jain) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1164

…dIndexParamsCreate (#1147) Using `cuvsTieredIndexParamsCreate` and `cuvsTieredIndexParamsDestroy` now instead of allocating arena in Java. -Used CloseableHandle and CuvsParamsHelper as used in #1109 and #1110 for consistency This fixes #1138 Authors: - Puneet Ahuja (https://github.com/punAhuja) Approvers: - MithunR (https://github.com/mythrocks) URL: #1147

Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1193

Add helper functions to construct a reasonable default CAGRA build parameters based on HNSW parameters. The goal is to build a CAGRA graph that can be converted to an HNSW graph which has very similar search performance as the corresponding HNSW-built graph. Additionally, this PR refactors CAGRA benchmark wrapper to store build parameters in a single place and flexibly set defaults based on the dataset shape. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #1125

The refine functions that work with GPU data use IVF-Flat under the hood to perform the refinement operation. This PR adds extern template declarations for `ivfflat_interleaved_scan` and uses these in the refine functions. This way we avoid recompiling the IVF-Flat search kernels, and save binary size. Before this PR `ivfflat_interleaved_scan` was compiled through the `ivf_flat::search()` function instantiations. But the function symbols were not available due to inlining. This PR also add explicit instantiations for `ivfflat_interleaved_scan`, and now both `ivf_flat::search` and `refine` can use the same interleaved scan function. Authors: - Tamas Bela Feher (https://github.com/tfeher) Approvers: - Artem M. Chirkin (https://github.com/achirkin) - Divye Gala (https://github.com/divyegala) URL: #1095

This PR gives a proof-of-concept implementation of GPU-based index build for the ScaNN index. The ScaNN index defined here is similar to IVF-PQ index in structure (a tree structure coming from kmeans, plus product quantization of vectors assigned to leaf nodes), together with “AVQ update” of the kmeans centroids and a spilled cluster assignment from the “SOAR” loss. Other features, optimizations, and customizability options to appear in subsequent PRs. * scann_build.cuh This file contains the implementation for build(..). The general pipeline looks like: Train kmeans centers on sampled data Assign all dataset vectors to kmeans clusters by minimizing L2 loss Update kmeans centers with AVQ Train PQ codebook on sampled residual vectors (here we use VPQ, slightly modified to perform product quantization on individual subspaces, e.g. each subspace has its own codebook) Quantization loop (batched): Compute spilled SOAR labels (performed here to minimize HtoD copies) Compute and quantize residuals/soar residuals using trained pq codebook If enabled, compute bf16 quantization of dataset vectors (performed here to minimize HtoD copies). * scann_avq.cuh This file contains apply_avq(..), which recomputes cluster centers using AVQ. The main technique is a single application of Theorem 4.2 in https://arxiv.org/pdf/1908.10396 to each cluster, using parameters: h_i_parallel = eta * || x_i || ^ (eta - 1) h_i_orthogonal = ||x _i || ^ (eta -1) The implementation of Theorem 4.2 is in compute_avq_centroid(..) The overall pipeline for apply_avq(..) is: Build clusters from kmeans cluster assignments For each cluster: Gather cluster vectors into single matrix Update kmeans center via compute_avq_centroid Rescale updated centroids (I need to add more details about this step). * scann_quantize.cuh This file contains helpers for PQ. Codebooks are created from residual vectors using train_pq from vpq_dataset.cuh (using a single vq center which is set to zero). Unlike in VPQ, codebooks are generated separately for each subspace, rather than collapsing all subspaces into a single space and computing a global codebook. * scann_soar.cuh The main function is compute_soar_labels(..), which computes a second, spilled cluster assignment by minimizing the SOAR loss function (Theorem 3.1 in https://arxiv.org/pdf/2404.00774) * scann_serialize.cuh Contains the implementation of serialize(..). The goal is to serialize ScaNN artifacts in a way that is usable with open-source ScaNN search with minimal additional post-processing. The cluster assignments, quantized vectors (for both the primary and spilled SOAR assignments), and bf16 dataset are all stored in separate .npy files for direct consumption by open-source ScaNN. The codebook and cluster centers are also serialized separately, but require additional post-processing into the correct Protobuf structs (not included in this PR). Test Plan: This code is mostly tested via CPU search with open-source ScaNN. Additional protobuf artifacts are created from the cuVS serialized index via an external tool. A pareto for OpenAI 5M is shown here: Authors: - https://github.com/rmaschal Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Artem M. Chirkin (https://github.com/achirkin) URL: #1120

copy-pr-bot · 2025-07-31T15:25:38Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

review-notebook-app · 2025-07-31T15:25:39Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

raydouglass and others added 30 commits April 30, 2025 15:12

DOC v25.08 Updates [skip ci]

17ce0a9

Merge branch 'branch-25.06' into branch-25.08-merge-25.06

a3a318a

Merge pull request #897 from gforsyth/branch-25.08-merge-25.06

98fae4d

Forward-merge branch-25.06 into branch-25.08

Merge branch 'branch-25.06' into branch-25.08-merge-25.06

2bdbbec

Merge pull request #909 from gforsyth/branch-25.08-merge-25.06

ece680a

Forward-merge branch-25.06 into branch-25.08

Merge pull request #912 from rapidsai/branch-25.06

dc60910

Forward-merge branch-25.06 into branch-25.08

Merge pull request #913 from rapidsai/branch-25.06

c0d3c07

Forward-merge branch-25.06 into branch-25.08

Merge pull request #924 from rapidsai/branch-25.06

97c1259

Forward-merge branch-25.06 into branch-25.08

Merge pull request #929 from rapidsai/branch-25.06

cf50341

Forward-merge branch-25.06 into branch-25.08

Merge pull request #932 from rapidsai/branch-25.06

46665b9

Forward-merge branch-25.06 into branch-25.08

Merge pull request #937 from rapidsai/branch-25.06

9631f8b

Forward-merge branch-25.06 into branch-25.08

Merge pull request #939 from rapidsai/branch-25.06

9bb888f

Forward-merge branch-25.06 into branch-25.08

Merge pull request #942 from rapidsai/branch-25.06

19a09e4

Forward-merge branch-25.06 into branch-25.08

Merge pull request #943 from rapidsai/branch-25.06

842b47a

Forward-merge branch-25.06 into branch-25.08

Merge pull request #945 from rapidsai/branch-25.06

3835e01

Forward-merge branch-25.06 into branch-25.08

Merge pull request #947 from rapidsai/branch-25.06

439ca00

Forward-merge branch-25.06 into branch-25.08

Merge pull request #948 from rapidsai/branch-25.06

dfacd36

Forward-merge branch-25.06 into branch-25.08

Merge pull request #950 from rapidsai/branch-25.06

9895d7e

Forward-merge branch-25.06 into branch-25.08

Merge pull request #952 from rapidsai/branch-25.06

ab9c539

Forward-merge branch-25.06 into branch-25.08

Merge pull request #953 from rapidsai/branch-25.06

1e28a00

Forward-merge branch-25.06 into branch-25.08

Merge pull request #954 from rapidsai/branch-25.06

cc91c8b

Forward-merge branch-25.06 into branch-25.08

Merge pull request #955 from rapidsai/branch-25.06

6a65b20

Forward-merge branch-25.06 into branch-25.08

Merge pull request #956 from rapidsai/branch-25.06

1efca07

Forward-merge branch-25.06 into branch-25.08

Merge pull request #957 from rapidsai/branch-25.06

123b647

Forward-merge branch-25.06 into branch-25.08

Merge pull request #958 from rapidsai/branch-25.06

958be45

Forward-merge branch-25.06 into branch-25.08

Merge pull request #959 from rapidsai/branch-25.06

36c38a5

Forward-merge branch-25.06 into branch-25.08

Remove CUDA 11 devcontainers and update CI scripts (#960)

4462d92

This PR removes CUDA 11 devcontainers and updates CI scripts. xref: rapidsai/build-planning#184 Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #960

Remove CUDA 11 from dependencies.yaml (#962)

5098d31

Issue: rapidsai/build-planning#184 Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - Bradley Dice (https://github.com/bdice) URL: #962

refactor(rattler): remove cuda11 options and general cleanup (#961)

db10359

xref rapidsai/build-planning#184 Authors: - Gil Forsyth (https://github.com/gforsyth) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: #961

cjnolet and others added 8 commits July 25, 2025 19:18

Removing all references to CUDA 11 from codebase (#1150)

b627c24

_Edit by @jakirkham_ - Fixes #1118 Authors: - Corey J. Nolet (https://github.com/cjnolet) - https://github.com/LizYou Approvers: - https://github.com/jakirkham - Ben Frederickson (https://github.com/benfred) URL: #1150

[ANN_BENCH] [DOCS] Add Vamana / DiskANN to cuvs-bench Docs (#1164)

921db94

Authors: - Tarang Jain (https://github.com/tarang-jain) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1164

Check shape is initialized in cuvsMatrixSliceRows (#1193)

2b3fc1b

Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1193

AyodeAwe requested review from a team as code owners July 31, 2025 15:25

github-project-automation Bot added this to Unstructured Data Processing Jul 31, 2025

AyodeAwe requested review from msarahan and removed request for a team July 31, 2025 15:25

github-project-automation Bot moved this to Todo in Unstructured Data Processing Jul 31, 2025

github-actions Bot added ci cpp CMake Python labels Jul 31, 2025

Update Changelog [skip ci]

94c2819

AyodeAwe merged commit 8af9b84 into main Aug 6, 2025
5 of 6 checks passed

github-project-automation Bot moved this from Todo to Done in Unstructured Data Processing Aug 6, 2025

josephine-wolf-oberholtzer moved this to Done in Unstructured Data Processing Jul 1, 2026

josephine-wolf-oberholtzer added this to Unstructured Data Processing Jul 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RELEASE] cuvs v25.08#1205

[RELEASE] cuvs v25.08#1205
AyodeAwe merged 115 commits into
mainfrom
branch-25.08

AyodeAwe commented Jul 31, 2025

Uh oh!

copy-pr-bot Bot commented Jul 31, 2025

Uh oh!

review-notebook-app Bot commented Jul 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Uh oh!

Conversation

AyodeAwe commented Jul 31, 2025

❄️ Code freeze for branch-25.08 and v25.08 release

What does this mean?

What is the purpose of this PR?

Uh oh!

copy-pr-bot Bot commented Jul 31, 2025

Uh oh!

review-notebook-app Bot commented Jul 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

❄️ Code freeze for `branch-25.08` and v25.08 release