You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Note**: The kernel uses generic function templates (`compute_distance<T>` and `apply_filter<IdxT>`) that are resolved at link time. The specific implementations (euclidean vs inner_product, filter_none vs filter_bitset) are provided by the fragments that get linked together.
477
477
478
-
### Step 5: Create `.cpp.in` Template Files for Embedding
478
+
### Step 5: Create Fragment Tags for Embedding
479
479
480
-
The `.cpp.in` files register the compiled fatbins so they can be loaded at runtime. Fragment tags are used to help the
481
-
linker find and include the relevant fatbins at build time.
480
+
Fragment tags register the compiled fatbins so they can be loaded at runtime. They are used to help the linker find and include the relevant fatbins at build time. When calling `generate_jit_lto_kernels()`, we pass a `FRAGMENT_TAG_FORMAT` argument, which constructs the tag type from the given placeholders, and a `FRAGMENT_TAG_HEADER_FILES` argument, which specifies one or more header files that the fragment tags come from. The JIT+LTO system will then automatically generate and compile a `.cpp` file that registers the fragment using the provided tag.
482
481
483
-
**Important**: In the `.cpp.in` files (which become `.cpp` files), we use **tags** (like `tag_f`, `tag_h`) instead of real types (like `float`, `__half`) in the `StaticFatbinFragmentEntry` template parameters. This avoids including heavy headers that define the actual types, significantly improving compilation times. The tags are lightweight empty structs that serve only as compile-time identifiers.
482
+
**Important**: When requesting fragments from the `AlgorithmPlanner`, we use **tags** (like `tag_f`, `tag_h`) instead of real types (like `float`, `__half`) in the `add_static_fragment` template parameters. This avoids including heavy headers that define the actual types, significantly improving compilation times. The tags are lightweight empty structs that serve only as compile-time identifiers.
@@ -565,7 +516,7 @@ The planner is responsible for:
565
516
3. Requesting the fragments from the fragment database
566
517
4. Linking them together to create a launchable kernel
567
518
568
-
**CRITICAL**: The fragment keys constructed in the planner methods must match **EXACTLY** with the keys used in the corresponding `.cpp.in` registration files. Any mismatch will result in runtime linking failures.
519
+
**CRITICAL**: The fragment keys constructed in the planner methods must match **EXACTLY** with the keys used in the corresponding `FRAGMENT_TAG_FORMAT` argument. Any mismatch will result in runtime linking failures.
569
520
570
521
**`search_planner.hpp`**:
571
522
@@ -642,11 +593,13 @@ constexpr auto get_out_type_tag() {
642
593
template <DistanceTypeMetric>
643
594
constexpr auto get_metric_tag() {
644
595
if constexpr (Metric == DistanceType::Euclidean) return tag_metric_euclidean{};
596
+
if constexpr (Metric == DistanceType::InnerProduct) return tag_metric_inner_product{};
645
597
}
646
598
647
599
template <FilterTypeFilter>
648
600
constexpr auto get_filter_tag() {
649
601
if constexpr (Filter == FilterType::None) return tag_filter_none{};
602
+
if constexpr (Filter == FilterType::Bitset) return tag_filter_bitset{};
These tags are used in `registerAlgorithm<>()` to create a hierarchical organization of fragments.
757
710
758
-
**Why Tags Instead of Real Types?**: Using tags instead of real types (like `float`, `__half`) in the `.cpp.in` files avoids including heavy headers that define those types. This significantly improves compilation times since the generated `.cpp` files don't need to pull in CUDA headers, type definitions, or other dependencies. Tags are lightweight compile-time identifiers that don't require any runtime overhead or additional includes.
759
-
760
711
### AlgorithmLauncher
761
712
762
713
The `AlgorithmLauncher` is the runtime handle for a linked kernel. It:
@@ -770,7 +721,7 @@ The `AlgorithmLauncher` is the runtime handle for a linked kernel. It:
770
721
771
722
2. **Fragment Granularity**: Balance between too many small fragments (overhead) and too few large fragments (less reuse). Device functions that are reused across multiple kernels are good candidates for separate fragments.
772
723
773
-
3. **Naming Consistency**: Ensure fragment keys match exactly between registration and lookup. Use helper functions to construct keys consistently.
724
+
3. **Naming Consistency**: Ensure fragment tags match exactly between registration and lookup. Use helper functions to construct tags consistently.
774
725
775
726
4. **Type Safety**: Use registration tags to provide compile-time type safety and avoid runtime string mismatches.
776
727
@@ -794,7 +745,8 @@ The `generate_jit_lto_kernels()` function (defined in `cmake/modules/generate_ji
794
745
- `NAME_FORMAT`: Format string for generated kernel names (using `@variable@` syntax)
795
746
- `MATRIX_JSON_FILE`: Path to the JSON matrix file
796
747
- `KERNEL_INPUT_FILE`: Path to the `.cu.in` template
797
-
- `EMBEDDED_INPUT_FILE`: Path to the `.cpp.in` template
748
+
- `FRAGMENT_TAG_FORMAT`: Format string for fragment tag type (using `@variable@` syntax)
749
+
- `FRAGMENT_TAG_HEADER_FILES`: List of header files that provide the fragment tag types (can be enclosed in `<`/`>` or `"`/`"`, automatically enclosed in quotes if quotes and brackets are not provided)
798
750
- `OUTPUT_DIRECTORY`: Where generated files are placed
799
751
- `KERNEL_LINK_LIBRARIES`: Interface library with compilation settings
800
752
@@ -814,7 +766,7 @@ The process involves:
814
766
1. Separating device functions into fragment headers
0 commit comments