Skip to content
87 changes: 6 additions & 81 deletions docs/source/jit_lto_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -475,87 +475,11 @@ extern "C" __global__ void search_kernel(

**Note**: The kernel uses generic function templates (`compute_distance<T>` and `apply_filter<IdxT>`) that are resolved at link time. The specific implementations (euclidean vs inner_product, filter_none vs filter_bitset) are provided by the fragments that get linked together.

### Step 5: Create `.cpp.in` Template Files for Embedding
### Step 5: Create Fragment Tags for Embedding

The `.cpp.in` files register the compiled fatbins so they can be loaded at runtime. Fragment tags are used to help the
linker find and include the relevant fatbins at build time.
Fragment tags register the compiled fatbins so they can be loaded at runtime. They are used to help the linker find and include the relevant fatbins at build time. When calling `generate_jit_lto_kernels()`, we pass a `FRAGMENT_TAG_FORMAT` argument, which constructs the tag type from the given placeholders, and a `FRAGMENT_TAG_HEADER_FILES` argument, which specifies one or more header files that the fragment tags come from. The JIT+LTO system will then automatically generate and compile a `.cpp` file that registers the fragment using the provided tag.
Comment thread
coderabbitai[bot] marked this conversation as resolved.

**Important**: In the `.cpp.in` files (which become `.cpp` files), we use **tags** (like `tag_f`, `tag_h`) instead of real types (like `float`, `__half`) in the `StaticFatbinFragmentEntry` template parameters. This avoids including heavy headers that define the actual types, significantly improving compilation times. The tags are lightweight empty structs that serve only as compile-time identifiers.

#### `compute_distance_embedded.cpp.in`

```text
/*
* SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION.
* SPDX-License-Identifier: Apache-2.0
*/

#include <cuvs/detail/jit_lto/FragmentEntry.hpp>
#include <cuvs/detail/jit_lto/registration_tags.hpp>
#include "@embedded_header_file@"

namespace example::detail {

using _FragmentEntry = StaticFatbinFragmentEntry<fragment_tag_compute_distance<tag_@distance_name@, tag_@type_abbrev@>>;

template <>
const uint8_t* const _FragmentEntry::data = embedded_fatbin;

template <>
const size_t _FragmentEntry::length = embedded_fatbin;

}
```

#### `filter_embedded.cpp.in`

```text
/*
* SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION.
* SPDX-License-Identifier: Apache-2.0
*/

#include <cuvs/detail/jit_lto/FragmentEntry.hpp>
#include <cuvs/detail/jit_lto/registration_tags.hpp>
#include "@embedded_header_file@"

namespace example::detail {

using _FragmentEntry = StaticFatbinFragmentEntry<fragment_tag_filter<tag_@filter_name@, tag_@idx_abbrev@>>;

template <>
const uint8_t* const _FragmentEntry::data = embedded_fatbin;

template <>
const size_t _FragmentEntry::length = embedded_fatbin;

}
```

#### `search_kernel_embedded.cpp.in`

```text
/*
* SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION.
* SPDX-License-Identifier: Apache-2.0
*/

#include <cuvs/detail/jit_lto/FragmentEntry.hpp>
#include <cuvs/detail/jit_lto/registration_tags.hpp>
#include "@embedded_header_file@"

namespace example::detail {

using _FragmentEntry = StaticFatbinFragmentEntry<fragment_tag_search<tag_@type_abbrev@, tag_@out_abbrev@, tag_@idx_abbrev@, @optimized@, @veclen@>>;

template <>
const uint8_t* const _FragmentEntry::data = embedded_fatbin;

template <>
const size_t _FragmentEntry::length = embedded_fatbin;

}
```
**Important**: When requesting fragments from the `AlgorithmPlanner`, we use **tags** (like `tag_f`, `tag_h`) instead of real types (like `float`, `__half`) in the `add_static_fragment` template parameters. This avoids including heavy headers that define the actual types, significantly improving compilation times. The tags are lightweight empty structs that serve only as compile-time identifiers.
Comment thread
coderabbitai[bot] marked this conversation as resolved.

### Step 6: Create the Planner

Expand Down Expand Up @@ -794,7 +718,8 @@ The `generate_jit_lto_kernels()` function (defined in `cmake/modules/generate_ji
- `NAME_FORMAT`: Format string for generated kernel names (using `@variable@` syntax)
- `MATRIX_JSON_FILE`: Path to the JSON matrix file
- `KERNEL_INPUT_FILE`: Path to the `.cu.in` template
- `EMBEDDED_INPUT_FILE`: Path to the `.cpp.in` template
- `FRAGMENT_TAG_FORMAT`: Format string for fragment tag type (using `@variable@` syntax)
- `FRAGMENT_TAG_HEADER_FILES`: List of header files that provide the fragment tag types (should be enclosed in `<`/`>`)
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated
- `OUTPUT_DIRECTORY`: Where generated files are placed
- `KERNEL_LINK_LIBRARIES`: Interface library with compilation settings

Expand All @@ -814,7 +739,7 @@ The process involves:
1. Separating device functions into fragment headers
2. Creating JSON matrices defining parameter combinations
3. Creating `.cu.in` templates for explicit instantiations
4. Creating `.cpp.in` templates for fatbin registration
4. Creating fragment tag types for fatbin registration
5. Creating a planner to manage fragment dependencies
6. Integrating the planner into the code path to launch kernels
7. **Adding CMake integration** to generate and compile all fragment variants
Expand Down
Loading