Skip to content

Commit adcb0a2

Browse files
committed
Merge remote-tracking branch 'upstream' into HH-Dataset-API
2 parents d53c4a8 + 7cf69cb commit adcb0a2

1 file changed

Lines changed: 28 additions & 76 deletions

File tree

docs/source/jit_lto_guide.md

Lines changed: 28 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -475,86 +475,37 @@ extern "C" __global__ void search_kernel(
475475

476476
**Note**: The kernel uses generic function templates (`compute_distance<T>` and `apply_filter<IdxT>`) that are resolved at link time. The specific implementations (euclidean vs inner_product, filter_none vs filter_bitset) are provided by the fragments that get linked together.
477477

478-
### Step 5: Create `.cpp.in` Template Files for Embedding
478+
### Step 5: Create Fragment Tags for Embedding
479479

480-
The `.cpp.in` files register the compiled fatbins so they can be loaded at runtime. Fragment tags are used to help the
481-
linker find and include the relevant fatbins at build time.
480+
Fragment tags register the compiled fatbins so they can be loaded at runtime. They are used to help the linker find and include the relevant fatbins at build time. When calling `generate_jit_lto_kernels()`, we pass a `FRAGMENT_TAG_FORMAT` argument, which constructs the tag type from the given placeholders, and a `FRAGMENT_TAG_HEADER_FILES` argument, which specifies one or more header files that the fragment tags come from. The JIT+LTO system will then automatically generate and compile a `.cpp` file that registers the fragment using the provided tag.
482481

483-
**Important**: In the `.cpp.in` files (which become `.cpp` files), we use **tags** (like `tag_f`, `tag_h`) instead of real types (like `float`, `__half`) in the `StaticFatbinFragmentEntry` template parameters. This avoids including heavy headers that define the actual types, significantly improving compilation times. The tags are lightweight empty structs that serve only as compile-time identifiers.
482+
**Important**: When requesting fragments from the `AlgorithmPlanner`, we use **tags** (like `tag_f`, `tag_h`) instead of real types (like `float`, `__half`) in the `add_static_fragment` template parameters. This avoids including heavy headers that define the actual types, significantly improving compilation times. The tags are lightweight empty structs that serve only as compile-time identifiers.
484483

485-
#### `compute_distance_embedded.cpp.in`
484+
**`registration_tags.hpp`**
486485

487-
```text
488-
/*
489-
* SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION.
490-
* SPDX-License-Identifier: Apache-2.0
491-
*/
492-
493-
#include <cuvs/detail/jit_lto/FragmentEntry.hpp>
494-
#include <cuvs/detail/jit_lto/registration_tags.hpp>
495-
#include "@embedded_header_file@"
496-
497-
namespace example::detail {
498-
499-
using _FragmentEntry = StaticFatbinFragmentEntry<fragment_tag_compute_distance<tag_@distance_name@, tag_@type_abbrev@>>;
500-
501-
template <>
502-
const uint8_t* const _FragmentEntry::data = embedded_fatbin;
503-
504-
template <>
505-
const size_t _FragmentEntry::length = embedded_fatbin;
506-
507-
}
508-
```
509-
510-
#### `filter_embedded.cpp.in`
511-
512-
```text
513-
/*
514-
* SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION.
515-
* SPDX-License-Identifier: Apache-2.0
516-
*/
517-
518-
#include <cuvs/detail/jit_lto/FragmentEntry.hpp>
519-
#include <cuvs/detail/jit_lto/registration_tags.hpp>
520-
#include "@embedded_header_file@"
521-
522-
namespace example::detail {
523-
524-
using _FragmentEntry = StaticFatbinFragmentEntry<fragment_tag_filter<tag_@filter_name@, tag_@idx_abbrev@>>;
525-
526-
template <>
527-
const uint8_t* const _FragmentEntry::data = embedded_fatbin;
528-
529-
template <>
530-
const size_t _FragmentEntry::length = embedded_fatbin;
531-
532-
}
533-
```
534-
535-
#### `search_kernel_embedded.cpp.in`
536-
537-
```text
538-
/*
539-
* SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION.
540-
* SPDX-License-Identifier: Apache-2.0
541-
*/
486+
```cpp
487+
#pragma once
542488

543-
#include <cuvs/detail/jit_lto/FragmentEntry.hpp>
544-
#include <cuvs/detail/jit_lto/registration_tags.hpp>
545-
#include "@embedded_header_file@"
489+
struct tag_h{};
490+
struct tag_f{};
491+
struct tag_d{};
492+
struct tag_ui{};
493+
struct tag_l{};
546494

547-
namespace example::detail {
495+
struct tag_metric_euclidean {};
496+
struct tag_metric_inner_product {};
548497

549-
using _FragmentEntry = StaticFatbinFragmentEntry<fragment_tag_search<tag_@type_abbrev@, tag_@out_abbrev@, tag_@idx_abbrev@, @optimized@, @veclen@>>;
498+
struct tag_filter_none {};
499+
struct tag_filter_bitset {};
550500

551-
template <>
552-
const uint8_t* const _FragmentEntry::data = embedded_fatbin;
501+
template <typename DataTag, typename OutTag, typename IdxTag, bool Optimized, int Veclen>
502+
struct fragment_tag_search {};
553503

554-
template <>
555-
const size_t _FragmentEntry::length = embedded_fatbin;
504+
template <typename DistanceTag, typename DataTag>
505+
struct fragment_tag_compute_distance {};
556506

557-
}
507+
template <typename FilterTag, typename IndexTag>
508+
struct fragment_tag_filter {};
558509
```
559510
560511
### Step 6: Create the Planner
@@ -565,7 +516,7 @@ The planner is responsible for:
565516
3. Requesting the fragments from the fragment database
566517
4. Linking them together to create a launchable kernel
567518
568-
**CRITICAL**: The fragment keys constructed in the planner methods must match **EXACTLY** with the keys used in the corresponding `.cpp.in` registration files. Any mismatch will result in runtime linking failures.
519+
**CRITICAL**: The fragment keys constructed in the planner methods must match **EXACTLY** with the keys used in the corresponding `FRAGMENT_TAG_FORMAT` argument. Any mismatch will result in runtime linking failures.
569520
570521
**`search_planner.hpp`**:
571522
@@ -642,11 +593,13 @@ constexpr auto get_out_type_tag() {
642593
template <DistanceType Metric>
643594
constexpr auto get_metric_tag() {
644595
if constexpr (Metric == DistanceType::Euclidean) return tag_metric_euclidean{};
596+
if constexpr (Metric == DistanceType::InnerProduct) return tag_metric_inner_product{};
645597
}
646598

647599
template <FilterType Filter>
648600
constexpr auto get_filter_tag() {
649601
if constexpr (Filter == FilterType::None) return tag_filter_none{};
602+
if constexpr (Filter == FilterType::Bitset) return tag_filter_bitset{};
650603
}
651604

652605
template <typename T, typename OutT, typename IdxT, DistanceType Metric, FilterType Filter, bool Optimized, int Veclen>
@@ -755,8 +708,6 @@ struct tag_l {}; // int64_t
755708
756709
These tags are used in `registerAlgorithm<>()` to create a hierarchical organization of fragments.
757710
758-
**Why Tags Instead of Real Types?**: Using tags instead of real types (like `float`, `__half`) in the `.cpp.in` files avoids including heavy headers that define those types. This significantly improves compilation times since the generated `.cpp` files don't need to pull in CUDA headers, type definitions, or other dependencies. Tags are lightweight compile-time identifiers that don't require any runtime overhead or additional includes.
759-
760711
### AlgorithmLauncher
761712
762713
The `AlgorithmLauncher` is the runtime handle for a linked kernel. It:
@@ -770,7 +721,7 @@ The `AlgorithmLauncher` is the runtime handle for a linked kernel. It:
770721
771722
2. **Fragment Granularity**: Balance between too many small fragments (overhead) and too few large fragments (less reuse). Device functions that are reused across multiple kernels are good candidates for separate fragments.
772723
773-
3. **Naming Consistency**: Ensure fragment keys match exactly between registration and lookup. Use helper functions to construct keys consistently.
724+
3. **Naming Consistency**: Ensure fragment tags match exactly between registration and lookup. Use helper functions to construct tags consistently.
774725
775726
4. **Type Safety**: Use registration tags to provide compile-time type safety and avoid runtime string mismatches.
776727
@@ -794,7 +745,8 @@ The `generate_jit_lto_kernels()` function (defined in `cmake/modules/generate_ji
794745
- `NAME_FORMAT`: Format string for generated kernel names (using `@variable@` syntax)
795746
- `MATRIX_JSON_FILE`: Path to the JSON matrix file
796747
- `KERNEL_INPUT_FILE`: Path to the `.cu.in` template
797-
- `EMBEDDED_INPUT_FILE`: Path to the `.cpp.in` template
748+
- `FRAGMENT_TAG_FORMAT`: Format string for fragment tag type (using `@variable@` syntax)
749+
- `FRAGMENT_TAG_HEADER_FILES`: List of header files that provide the fragment tag types (can be enclosed in `<`/`>` or `"`/`"`, automatically enclosed in quotes if quotes and brackets are not provided)
798750
- `OUTPUT_DIRECTORY`: Where generated files are placed
799751
- `KERNEL_LINK_LIBRARIES`: Interface library with compilation settings
800752
@@ -814,7 +766,7 @@ The process involves:
814766
1. Separating device functions into fragment headers
815767
2. Creating JSON matrices defining parameter combinations
816768
3. Creating `.cu.in` templates for explicit instantiations
817-
4. Creating `.cpp.in` templates for fatbin registration
769+
4. Creating fragment tag types for fatbin registration
818770
5. Creating a planner to manage fragment dependencies
819771
6. Integrating the planner into the code path to launch kernels
820772
7. **Adding CMake integration** to generate and compile all fragment variants

0 commit comments

Comments
 (0)