[RELEASE] cuvs v25.08#1205
Merged
Merged
Conversation
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Contributes to rapidsai/build-planning#181 * removes all uploads of conda packages and wheels to `downloads.rapids.ai` ## Notes for Reviewers ### How I identified changes Looked for uses of the relevant `gha-tools` tools, as well as documentation about `downloads.rapids.ai`, being on the NVIDIA VPN, using S3, etc. like this: ```shell git grep -i -E 's3|upload|downloads\.rapids|vpn' ``` ### How I tested this See "How I tested this" on rapidsai/shared-workflows#364 Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Gil Forsyth (https://github.com/gforsyth) - Bradley Dice (https://github.com/bdice) URL: #940
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
Forward-merge branch-25.06 into branch-25.08
This PR removes CUDA 11 devcontainers and updates CI scripts. xref: rapidsai/build-planning#184 Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #960
Issue: rapidsai/build-planning#184 Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - Bradley Dice (https://github.com/bdice) URL: #962
xref rapidsai/build-planning#184 Authors: - Gil Forsyth (https://github.com/gforsyth) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: #961
_Edit by @jakirkham_ - Fixes #1118 Authors: - Corey J. Nolet (https://github.com/cjnolet) - https://github.com/LizYou Approvers: - https://github.com/jakirkham - Ben Frederickson (https://github.com/benfred) URL: #1150
In #902 and #1034 we introduced a `Dataset` interface to support on-heap and off-heap ("native") memory seamlessly as inputs for cagra and bruteforce index building. As we expand the functionality of cuvs-java, we realized we have similar needs for outputs (see e.g. #1105 / #1102 or #1104). This PR extends `Dataset` to support being used as an output, wrapping native (off-heap) memory in a convenient and efficient way, and providing common utilities to transform to and from on-heap memory. This work is inspired by the existing raft `mdspan` and `DLTensor` data structures, but tailored to our needs (2d only, just 3 data types, etc.). The PR keeps the current implementation simple and minimal on purpose, but structured in a way that is simple to extend. By itself, the PR is just a refactoring to extend the `Dataset` implementation and reorganize the implementation classes; its real usefulness will be in using it in the PRs mentioned above (in fact, this PR has been extracted from #1105). The implementation class hierarchy is implemented with future extensions in mind: atm we have one `HostMemoryDatasetImpl`, but we are already thinking to have a corresponding `DeviceMemoryDatasetImpl` that will wrap and manage (views) on GPU memory to avoid (in some cases) extra copies of data from GPU memory to CPU memory only to process them or forward them to another algorithm (e.g quantization followed by indexing). Future work will also include add support/refactoring to allocate and manage GPU memory and DLTensors (e.g. working better with/refactoring `prepareTensor`). Authors: - Lorenzo Dematté (https://github.com/ldematte) - MithunR (https://github.com/mythrocks) Approvers: - MithunR (https://github.com/mythrocks) URL: #1111
Authors: - Tarang Jain (https://github.com/tarang-jain) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1164
…dIndexParamsCreate (#1147) Using `cuvsTieredIndexParamsCreate` and `cuvsTieredIndexParamsDestroy` now instead of allocating arena in Java. -Used CloseableHandle and CuvsParamsHelper as used in #1109 and #1110 for consistency This fixes #1138 Authors: - Puneet Ahuja (https://github.com/punAhuja) Approvers: - MithunR (https://github.com/mythrocks) URL: #1147
Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1193
Add helper functions to construct a reasonable default CAGRA build parameters based on HNSW parameters. The goal is to build a CAGRA graph that can be converted to an HNSW graph which has very similar search performance as the corresponding HNSW-built graph. Additionally, this PR refactors CAGRA benchmark wrapper to store build parameters in a single place and flexibly set defaults based on the dataset shape. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #1125
The refine functions that work with GPU data use IVF-Flat under the hood to perform the refinement operation. This PR adds extern template declarations for `ivfflat_interleaved_scan` and uses these in the refine functions. This way we avoid recompiling the IVF-Flat search kernels, and save binary size. Before this PR `ivfflat_interleaved_scan` was compiled through the `ivf_flat::search()` function instantiations. But the function symbols were not available due to inlining. This PR also add explicit instantiations for `ivfflat_interleaved_scan`, and now both `ivf_flat::search` and `refine` can use the same interleaved scan function. Authors: - Tamas Bela Feher (https://github.com/tfeher) Approvers: - Artem M. Chirkin (https://github.com/achirkin) - Divye Gala (https://github.com/divyegala) URL: #1095
This PR gives a proof-of-concept implementation of GPU-based index build for the ScaNN index. The ScaNN index defined here is similar to IVF-PQ index in structure (a tree structure coming from kmeans, plus product quantization of vectors assigned to leaf nodes), together with “AVQ update” of the kmeans centroids and a spilled cluster assignment from the “SOAR” loss. Other features, optimizations, and customizability options to appear in subsequent PRs. * scann_build.cuh This file contains the implementation for build(..). The general pipeline looks like: Train kmeans centers on sampled data Assign all dataset vectors to kmeans clusters by minimizing L2 loss Update kmeans centers with AVQ Train PQ codebook on sampled residual vectors (here we use VPQ, slightly modified to perform product quantization on individual subspaces, e.g. each subspace has its own codebook) Quantization loop (batched): Compute spilled SOAR labels (performed here to minimize HtoD copies) Compute and quantize residuals/soar residuals using trained pq codebook If enabled, compute bf16 quantization of dataset vectors (performed here to minimize HtoD copies). * scann_avq.cuh This file contains apply_avq(..), which recomputes cluster centers using AVQ. The main technique is a single application of Theorem 4.2 in https://arxiv.org/pdf/1908.10396 to each cluster, using parameters: h_i_parallel = eta * || x_i || ^ (eta - 1) h_i_orthogonal = ||x _i || ^ (eta -1) The implementation of Theorem 4.2 is in compute_avq_centroid(..) The overall pipeline for apply_avq(..) is: Build clusters from kmeans cluster assignments For each cluster: Gather cluster vectors into single matrix Update kmeans center via compute_avq_centroid Rescale updated centroids (I need to add more details about this step). * scann_quantize.cuh This file contains helpers for PQ. Codebooks are created from residual vectors using train_pq from vpq_dataset.cuh (using a single vq center which is set to zero). Unlike in VPQ, codebooks are generated separately for each subspace, rather than collapsing all subspaces into a single space and computing a global codebook. * scann_soar.cuh The main function is compute_soar_labels(..), which computes a second, spilled cluster assignment by minimizing the SOAR loss function (Theorem 3.1 in https://arxiv.org/pdf/2404.00774) * scann_serialize.cuh Contains the implementation of serialize(..). The goal is to serialize ScaNN artifacts in a way that is usable with open-source ScaNN search with minimal additional post-processing. The cluster assignments, quantized vectors (for both the primary and spilled SOAR assignments), and bf16 dataset are all stored in separate .npy files for direct consumption by open-source ScaNN. The codebook and cluster centers are also serialized separately, but require additional post-processing into the correct Protobuf structs (not included in this PR). Test Plan: This code is mostly tested via CPU search with open-source ScaNN. Additional protobuf artifacts are created from the cuVS serialized index via an external tool. A pareto for OpenAI 5M is shown here: Authors: - https://github.com/rmaschal Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Artem M. Chirkin (https://github.com/achirkin) URL: #1120
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
❄️ Code freeze for
branch-25.08and v25.08 releaseWhat does this mean?
Only critical/hotfix level issues should be merged into
branch-25.08until release (merging of this PR).What is the purpose of this PR?
branch-25.08intomainfor the release