NVIDIA · cjnolet · May 5, 2026 · May 5, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -576,7 +576,7 @@
 - Vendor RAPIDS.cmake ([#816](https://github.com/rapidsai/cuvs/pull/816)) [@bdice](https://github.com/bdice)
 - Update libcuvs libraft ver to 25.06 in conda env ([#808](https://github.com/rapidsai/cuvs/pull/808)) [@jinsolp](https://github.com/jinsolp)
 - Moving NN Descent class and struct declarations to `nn_descent_gnnd.hpp` ([#803](https://github.com/rapidsai/cuvs/pull/803)) [@jinsolp](https://github.com/jinsolp)
-- Remove `[@rapidsai/cuvs-build-codeowners` ([#783](https://github.com/rapidsai/cuvs/pull/783)) @KyleFromNVIDIA](https://github.com/rapidsai/cuvs-build-codeowners` ([#783](https://github.com/rapidsai/cuvs/pull/783)) @KyleFromNVIDIA)
+- Remove @rapidsai/cuvs-build-codeowners ([#783](https://github.com/rapidsai/cuvs/pull/783)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA)
 - Moving wheel builds to specified location and uploading build artifacts to Github ([#777](https://github.com/rapidsai/cuvs/pull/777)) [@VenkateshJaya](https://github.com/VenkateshJaya)
 - Remove unused raft cagra header in add_nodes.cuh ([#741](https://github.com/rapidsai/cuvs/pull/741)) [@jiangyinzuo](https://github.com/jiangyinzuo)
 - Expose kmeans to python ([#729](https://github.com/rapidsai/cuvs/pull/729)) [@benfred](https://github.com/benfred)

@@ -146,7 +146,7 @@ elif [[ "${RUN_CONTEXT}" == "release" ]]; then
 fi
 
 # Update cuvs-bench Docker image references (version-only, not branch-related)
-sed_runner "s|rapidsai/cuvs-bench:[0-9][0-9].[0-9][0-9]|rapidsai/cuvs-bench:${NEXT_SHORT_TAG}|g" docs/source/cuvs_bench/index.rst
+sed_runner "s|rapidsai/cuvs-bench:[0-9][0-9].[0-9][0-9]|rapidsai/cuvs-bench:${NEXT_SHORT_TAG}|g" docs/source/cuvs_bench/index.md
 
 # Version references (not branch-related)
 sed_runner "s|=[0-9][0-9].[0-9][0-9]|=${NEXT_SHORT_TAG}|g" README.md

@@ -35,6 +35,7 @@ dependencies:
 - libopenblas<=0.3.30
 - librmm==26.6.*,>=0.0.0a0
 - make
+- myst-parser
 - nccl>=2.19
 - ninja
 - numpy>=1.23,<3.0
@@ -45,12 +46,10 @@ dependencies:
 - pytest
 - pytest-cov
 - rapids-build-backend>=0.4.0,<0.5.0
-- recommonmark
 - rust
 - scikit-build-core>=0.11.0
 - scikit-learn>=1.5
 - sphinx-copybutton
-- sphinx-markdown-tables
 - sphinx>=8.0.0
 - sysroot_linux-aarch64==2.28
 - pip:

@@ -34,6 +34,7 @@ dependencies:
 - libnvjitlink-dev
 - librmm==26.6.*,>=0.0.0a0
 - make
+- myst-parser
 - nccl>=2.19
 - ninja
 - numpy>=1.23,<3.0
@@ -44,12 +45,10 @@ dependencies:
 - pytest
 - pytest-cov
 - rapids-build-backend>=0.4.0,<0.5.0
-- recommonmark
 - rust
 - scikit-build-core>=0.11.0
 - scikit-learn>=1.5
 - sphinx-copybutton
-- sphinx-markdown-tables
 - sphinx>=8.0.0
 - sysroot_linux-64==2.28
 - pip:

@@ -35,6 +35,7 @@ dependencies:
 - libopenblas<=0.3.30
 - librmm==26.6.*,>=0.0.0a0
 - make
+- myst-parser
 - nccl>=2.19
 - ninja
 - numpy>=1.23,<3.0
@@ -45,12 +46,10 @@ dependencies:
 - pytest
 - pytest-cov
 - rapids-build-backend>=0.4.0,<0.5.0
-- recommonmark
 - rust
 - scikit-build-core>=0.11.0
 - scikit-learn>=1.5
 - sphinx-copybutton
-- sphinx-markdown-tables
 - sphinx>=8.0.0
 - sysroot_linux-aarch64==2.28
 - pip:

@@ -34,6 +34,7 @@ dependencies:
 - libnvjitlink-dev
 - librmm==26.6.*,>=0.0.0a0
 - make
+- myst-parser
 - nccl>=2.19
 - ninja
 - numpy>=1.23,<3.0
@@ -44,12 +45,10 @@ dependencies:
 - pytest
 - pytest-cov
 - rapids-build-backend>=0.4.0,<0.5.0
-- recommonmark
 - rust
 - scikit-build-core>=0.11.0
 - scikit-learn>=1.5
 - sphinx-copybutton
-- sphinx-markdown-tables
 - sphinx>=8.0.0
 - sysroot_linux-64==2.28
 - pip:

@@ -450,11 +450,10 @@ dependencies:
           - doxygen>=1.8.20
           - graphviz
           - ipython
+          - myst-parser
           - numpydoc
-          - recommonmark
           - sphinx>=8.0.0
           - sphinx-copybutton
-          - sphinx-markdown-tables
           - pip:
               - nvidia-sphinx-theme
   rust:

@@ -0,0 +1,22 @@
+# Advanced Topics
+
+- [Just-in-Time Compilation](#just-in-time-compilation)
+
+## Just-in-Time Compilation
+cuVS uses the Just-in-Time (JIT) [Link-Time Optimization (LTO)](https://developer.nvidia.com/blog/cuda-12-0-compiler-support-for-runtime-lto-using-nvjitlink-library/) compilation technology to compile certain kernels. When a JIT compilation is triggered, cuVS will compile the kernel for your architecture and automatically cache it in-memory and on-disk. The validity of the cache is as follows:
+
+1. In-memory cache is valid for the lifetime of the process.
+2. On-disk cache is valid until a CUDA driver upgrade is performed. The cache can be portably shared between machines in network or cloud storage and we strongly recommend that you store the cache in a persistent location. For more details on how to configure the on-disk cache, look at CUDA documentation on [JIT Compilation](https://docs.nvidia.com/cuda/cuda-programming-guide/05-appendices/environment-variables.html#jit-compilation). Specifically, the environment variables of interest are: `CUDA_CACHE_PATH` and `CUDA_CACHE_MAX_SIZE`.
+
+
+Thus, the JIT compilation is a one-time cost and you can expect no loss in real performance after the first compilation. We recommend that you run a "warmup" to trigger the JIT compilation before the actual usage.
+
+Currently, the following capabilities will trigger a JIT compilation:
+- IVF Flat search APIs: [cuvs::neighbors::ivf_flat::search()](cpp_api/neighbors_ivf_flat.md)
+
+```{toctree}
+:maxdepth: 2
+
+jit_lto_guide
+```
+
@@ -0,0 +1,81 @@
+# cuVS API Basics
+
+- [Memory management](#memory-management)
+- [Resource management](#resource-management)
+
+## Memory management
+
+Centralized memory management allows flexible configuration of allocation strategies, such as sharing the same CUDA memory pool across library boundaries. cuVS uses the [RMM](https://github.com/rapidsai/rmm) library, which eases the burden of configuring different allocation strategies globally across GPU-accelerated libraries.
+
+RMM currently has APIs for C++ and Python.
+
+### C++
+
+Here's an example of configuring RMM to use a pool allocator in C++ (derived from the RMM example [here](https://github.com/rapidsai/rmm?tab=readme-ov-file#example)):
+
+```c++
+rmm::mr::cuda_memory_resource cuda_mr;
+// Construct a resource that uses a coalescing best-fit pool allocator
+// With the pool initially half of available device memory
+auto initial_size = rmm::percent_of_free_device_memory(50);
+rmm::mr::pool_memory_resource pool_mr{cuda_mr, initial_size};
+rmm::mr::set_current_device_resource(pool_mr);
+auto mr = rmm::mr::get_current_device_resource_ref();
+```
+
+### Python
+
+And the corresponding code in Python (derived from the RMM example [here](https://github.com/rapidsai/rmm?tab=readme-ov-file#memoryresource-objects)):
+
+```python
+import rmm
+pool = rmm.mr.PoolMemoryResource(
+  rmm.mr.CudaMemoryResource(),
+  initial_pool_size=2**30,
+  maximum_pool_size=2**32)
+rmm.mr.set_current_device_resource(pool)
+```
+
+## Resource management
+
+cuVS uses an API from the [RAFT](https://github.com/rapidsai/raft) library of ML and data mining primitives to centralize and reuse expensive resources, such as memory management. The below code examples demonstrate how to create these resources for use throughout this guide.
+
+See RAFT's [resource API documentation](https://docs.rapids.ai/api/raft/nightly/cpp_api/core_resources/) for more information.
+
+C
+^
+
+```c
+#include <cuda_runtime.h>
+#include <cuvs/core/c_api.h>
+
+cuvsResources_t res;
+cuvsResourcesCreate(&res);
+
+// ... do some processing ...
+
+cuvsResourcesDestroy(res);
+```
+
+### C++
+
+```c++
+#include <raft/core/device_resources.hpp>
+
+raft::device_resources res;
+```
+
+### Python
+
+```python
+import pylibraft
+
+res = pylibraft.common.DeviceResources()
+```
+
+### Rust
+
+```rust
+let res = cuvs::Resources::new()?;
+```
+
@@ -0,0 +1,13 @@
+# API Reference
+
+```{toctree}
+:maxdepth: 3
+
+c_api.md
+cpp_api.md
+python_api.md
+rust_api/index.md
+```
+
+* [Index](genindex.html)
+* [Search](search.html)