Skip to content

Add GitHub Actions for C ABI checking#1

Merged
benfred merged 4 commits into
benfred:c_abi_checkerfrom
msarahan:add-github-actions-abi-check
Feb 2, 2026
Merged

Add GitHub Actions for C ABI checking#1
benfred merged 4 commits into
benfred:c_abi_checkerfrom
msarahan:add-github-actions-abi-check

Conversation

@msarahan

Copy link
Copy Markdown

Summary

Adds GitHub Actions workflows to automate C ABI compatibility checking, building on the existing ci/check_c_abi.py script from PR NVIDIA#1749.

Changes

New Workflows

.github/workflows/check-c-abi.yaml

  • On PR: Downloads baseline ABI from main and checks for breaking changes
  • On merge to main: Extracts and stores ABI baseline as artifacts
  • Manual trigger: Allows bootstrapping the initial baseline

.github/workflows/store-c-abi-baseline.yaml

  • On release: Archives the main baseline with version-specific name
  • Stores in both artifacts (short-term) and baselines branch (permanent)

Integration

.github/workflows/pr.yaml

  • Adds check-c-abi job to PR pipeline
  • Blocks merge if breaking ABI changes are detected

Key Features

Intelligent baseline management:

  1. Try merge-base commit baseline (most accurate)
  2. Fall back to latest main baseline
  3. Fall back to fresh extraction if needed

Race condition prevention:

  • Commit-specific baselines (c-abi-baseline-{sha}, 90-day retention)
  • Latest main baseline (c-abi-baseline-main, never expires)
  • Concurrency control serializes baseline updates

Developer experience:

  • Automatic PR comments on ABI breaking changes
  • Clear error messages with actionable next steps
  • Transparent reporting of which baseline was used

Bootstrap Instructions

After merge, manually run the "C ABI Compatibility Check" workflow on main branch to create the initial baseline.

msarahan and others added 4 commits January 30, 2026 20:36
- Add check-c-abi.yaml workflow for PRs
- Add store-c-abi-baseline.yaml workflow for releases
- Integrate ABI check into pr.yaml pipeline

Co-authored-by: Cursor <cursoragent@cursor.com>
- Extract and store main baseline on push to main (never expires)
- PRs download and compare against main baseline
- Release workflow archives the main baseline with version tag
- Eliminates duplicate ABI extraction work

Co-authored-by: Cursor <cursoragent@cursor.com>
Enables bootstrapping the initial c-abi-baseline-main artifact

Co-authored-by: Cursor <cursoragent@cursor.com>
- Store baselines with commit SHA for precise comparisons
- Cascade: try merge-base → latest main → extract fresh
- Add concurrency control to prevent race conditions
- Report which baseline source was used for transparency

Co-authored-by: Cursor <cursoragent@cursor.com>

@benfred benfred left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks amazing - thanks for this @msarahan !

@benfred benfred merged commit d6d52ba into benfred:c_abi_checker Feb 2, 2026
1 check passed
cjnolet pushed a commit that referenced this pull request Mar 25, 2026
…AGRA search switches kernel variants (NVIDIA#1851)

Fix a bug in `safely_launch_kernel_with_smem_size` where `cudaFuncSetAttribute` was skipped for kernels that needed it. The function tracked the max shared memory in a single static variable per KernelT type, but `cudaFuncSetAttribute` applies per function pointer value — and the single-CTA CAGRA [search](https://github.com/rapidsai/cuvs/blob/d7a28aa1cb7648fa61037ed0459df0ec0e9db841/cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh#L1373C4-L1375C78) dispatches multiple kernel instantiations that share the same pointer type. When one kernel bumped the tracked max, a different kernel whose smem fell between its own previous max and the global max would skip `cudaFuncSetAttribute`, causing `cudaErrorInvalidValue`. The fix tracks the kernel pointer identity alongside a monotonically growing smem high-water mark: when the pointer changes, the new kernel is brought up to the high-water mark; when smem exceeds it, the mark is grown.

## Error in question
```c++
$ CUVS_CAGRA_ANN_BENCH --search --data_prefix='<DATA_DIR>/' --benchmark_out_format=csv --benchmark_out=res_search_iter_cagra.csv --benchmark_counters_tabular=true --override_kv=dataset_memory_type:\"device\" <CONFIG_DIR>/laion_1M_cagra_iterative.json
[I] [12:28:52.095261] Using the query file '<DATA_DIR>/laion_1M/queries.fbin'
[I] [12:28:52.096141] Using the ground truth file '<DATA_DIR>/laion_1M/groundtruth.1M.neighbors.ibin'
2026-02-25T12:28:52+00:00
Running CUVS_CAGRA_ANN_BENCH
Run on (224 X 800 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x112)
  L1 Instruction 32 KiB (x112)
  L2 Unified 2048 KiB (x112)
  L3 Unified 307200 KiB (x2)
Load Average: 0.70, 0.44, 0.28
dataset: laion_1M
dim: 768
distance: euclidean
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/0/process_time/real_time        5.70 ms         5.70 ms          121   5.68808m   5.69994m    0.96424   0.689692       1.75441M/s         64         10              8        10k            1            2         1.21M dataset_memory_type="device"
cuvs_cagra_iterative/1/process_time/real_time        5.70 ms         5.70 ms          121    5.6863m   5.69879m    0.96424   0.689553       1.75477M/s         64         10              8        10k            1            2         1.21M dataset_memory_type="device"
cuvs_cagra_iterative/2/process_time/real_time        4.92 ms         4.92 ms          140   4.90351m   4.91567m    0.96046   0.688193       2.03432M/s        128         10             12        10k            1            1          1.4M dataset_memory_type="device"
cuvs_cagra_iterative/3/process_time/real_time        5.99 ms         5.99 ms          115   5.97476m   5.98617m    0.97519   0.688409       1.67052M/s        128         10             16        10k            1            1         1.15M dataset_memory_type="device"
cuvs_cagra_iterative/4/process_time/real_time        6.97 ms         6.97 ms           99   6.95873m    6.9703m    0.98129   0.690059       1.43466M/s        256         10             16        10k            1            1          990k dataset_memory_type="device"
cuvs_cagra_iterative/5/process_time/real_time        10.5 ms         10.5 ms           66   0.010479  0.0104908    0.98548   0.692391       953.222k/s        512         10             10        10k            1            2          660k dataset_memory_type="device"
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
cuvs_cagra_iterative/6/process_time/real_time  ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
Obtained 19 stack frames
#1 in CUVS_CAGRA_ANN_BENCH: raft::cuda_error::cuda_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
#2 in libcuvs.so: void cuvs::neighbors::cagra::detail::single_cta_search::select_and_run<float, unsigned int, float, unsigned int, cuvs::neighbors::filtering::none_sample_filter>(...)
NVIDIA#3 in libcuvs.so: cuvs::neighbors::cagra::detail::single_cta_search::search<float, unsigned int, float, cuvs::neighbors::filtering::none_sample_filter, unsigned int, long>::operator()(...)
NVIDIA#4 in libcuvs.so(+0x18fd0f1)
NVIDIA#5 in libcuvs.so: void cuvs::neighbors::cagra::search<float, unsigned int, long>(...)
NVIDIA#6-NVIDIA#19 in CUVS_CAGRA_ANN_BENCH / libc.so.6
'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/7/process_time/real_time        10.5 ms         10.5 ms           66  0.0105088  0.0105202    0.98663   0.694332       950.555k/s         32         10             32        10k            1            1          660k dataset_memory_type="device"
cuvs_cagra_iterative/8/process_time/real_time        12.8 ms         12.8 ms           54   0.012796  0.0128079    0.98807   0.691628       780.768k/s         32         10             64        10k            1            1          540k dataset_memory_type="device"
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
cuvs_cagra_iterative/9/process_time/real_time  ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
[same stack trace as above]
'
cuvs_cagra_iterative/10/process_time/real_time ERROR OCCURRED: 'Benchmark loop: CUDA error encountered at: file=cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh line=2348: call='cudaPeekAtLastError()', Reason=cudaErrorInvalidValue:invalid argument
[same stack trace as above]
'
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k max_iterations  n_queries refine_ratio search_width total_queries
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cuvs_cagra_iterative/11/process_time/real_time       46.1 ms         46.2 ms           15  0.0461323  0.0461439    0.99131   0.692158       216.714k/s        256         10             10        10k            1           16          150k dataset_memory_type="device"
cuvs_cagra_iterative/12/process_time/real_time        142 ms          142 ms            5   0.141713   0.141725    0.99198   0.708627       70.5591k/s        512         10             32        10k            1           16           50k dataset_memory_type="device"
``` 

## Config
```
{
  "dataset": {
    "name": "laion_1M",
    "base_file": "laion_1M/base.1M.fbin",
    "subset_size": 1000000,
    "query_file": "laion_1M/queries.fbin",
    "groundtruth_neighbors_file": "laion_1M/groundtruth.1M.neighbors.ibin",
    "distance": "euclidean"
  },
  "search_basic_param": {
    "batch_size": 10000,
    "k": 10
  },
  "index": [
  
    {
      "name": "cuvs_cagra_iterative",
      "algo": "cuvs_cagra",
      "build_param": { 
        "graph_degree": 64,
        "intermediate_graph_degree": 128,
        "search_width": 1
      },
      "file": "laion_1M/cagra/q_coarse_iterative.ibin",
      "search_params": [
        {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1},
        {"itopk": 64, "search_width": 2, "max_iterations": 8, "refine_ratio": 1},
        {"itopk": 128, "search_width": 1, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 128, "search_width": 1, "max_iterations": 16, "refine_ratio": 1},
        {"itopk": 256, "search_width": 1, "max_iterations": 16, "refine_ratio": 1},
        {"itopk": 512, "search_width": 2, "max_iterations": 10, "refine_ratio": 1},
        {"itopk": 256, "search_width": 2, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 32, "search_width": 1, "max_iterations": 32, "refine_ratio": 1},
        {"itopk": 32, "search_width": 1, "max_iterations": 64, "refine_ratio": 1},
        {"itopk": 192, "search_width": 4, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 256, "search_width": 4, "max_iterations": 12, "refine_ratio": 1},
        {"itopk": 256, "search_width": 16, "max_iterations": 10, "refine_ratio": 1},
        {"itopk": 512, "search_width": 16, "max_iterations": 32, "refine_ratio": 1}
      ]
    }
  ]
}

```

Authors:
  - https://github.com/irina-resh-nvda

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)

URL: NVIDIA#1851
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants