Skip to content

feat(kernels): add TVM FFI Triton CUBIN wrapper#315

Open
Ma1oneZhang wants to merge 1 commit into
xiaguan:mainfrom
Ma1oneZhang:feat/tvm-ffi-triton-cubin
Open

feat(kernels): add TVM FFI Triton CUBIN wrapper#315
Ma1oneZhang wants to merge 1 commit into
xiaguan:mainfrom
Ma1oneZhang:feat/tvm-ffi-triton-cubin

Conversation

@Ma1oneZhang

Copy link
Copy Markdown
Contributor

Summary

Adds the MVP production-facing TVM FFI path for Triton AOT CUBIN launchers in pegainfer-kernels.

This makes tvm-ffi a required pegainfer-kernels dependency, exposes a packed TVM FFI global wrapper for the generated Qwen3.5 gated-delta-rule Triton launcher, and adds a small example showing registration/discovery of the wrapper ABI.

Changes

  • Add required tvm-ffi dependency to pegainfer-kernels.
  • Add pegainfer_kernels::triton_cubin with a TVM FFI packed wrapper for gated_delta_rule_prefill_chunk_solve_cuda.
  • Add the triton_cubin_tvm_ffi example for registering and inspecting the MVP wrapper.
  • Document the Triton CUBIN TVM FFI boundary and record the project-doc execution notes.

Validation

  • cargo fmt --all --check
  • cargo metadata --no-deps --format-version 1
  • cargo check --release -p pegainfer-kernels was run after adding /home/ziyang/gpu_memory_profiling/.venv/bin to PATH so tvm-ffi-config could be found. tvm-ffi built successfully, then the existing CUDA build failed in pegainfer-kernels/csrc/shared/flashinfer_top1.cu because the local dirty FlashInfer submodule exposes a different TopKDispatch API. The submodule change is not part of this PR.

Related

Related to #191.

This is intentionally narrower than #202: #202 validates optional bidirectional interop, while this PR makes tvm-ffi required for pegainfer-kernels and wires the first Triton CUBIN wrapper surface.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces tvm-ffi as a required dependency for pegainfer-kernels and implements a TVM FFI wrapper layer for Triton AOT CUBIN launchers, specifically exposing the Qwen3.5 gated-delta-rule triangular solve launcher. It also adds corresponding documentation, tests, and an example. Feedback on the implementation highlights a critical runtime issue in triton_cubin.rs where integer arguments are extracted directly as u64 or i32 using AnyView::try_as. Because TVM FFI packed integer arguments are always passed as i64, these strict type-matching checks will fail at runtime. The reviewer recommends extracting them as i64 first before casting.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +106 to +117
fn arg_handle(args: &[AnyView<'_>], spec: TritonCubinFunctionSpec, idx: usize) -> TvmResult<usize> {
let value = args
.get(idx)
.ok_or_else(|| type_error(spec, idx, "an integer or opaque pointer"))?;
if let Some(raw) = value.try_as::<u64>() {
return Ok(raw as usize);
}
if let Some(raw) = value.try_as::<*mut c_void>() {
return Ok(raw as usize);
}
Err(type_error(spec, idx, "an integer or opaque pointer"))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In TVM FFI, packed function arguments of integer type are always passed as signed 64-bit integers (i64). Since AnyView::try_as performs strict type-code matching, attempting to extract them using try_as::<u64>() will fail at runtime with a type error. To ensure robustness, we should attempt to extract the value as i64 first before trying u64 or *mut c_void.

fn arg_handle(args: &[AnyView<'_>], spec: TritonCubinFunctionSpec, idx: usize) -> TvmResult<usize> {
    let value = args
        .get(idx)
        .ok_or_else(|| type_error(spec, idx, "an integer or opaque pointer"))?;
    if let Some(raw) = value.try_as::<i64>() {
        return Ok(raw as usize);
    }
    if let Some(raw) = value.try_as::<u64>() {
        return Ok(raw as usize);
    }
    if let Some(raw) = value.try_as::<*mut c_void>() {
        return Ok(raw as usize);
    }
    Err(type_error(spec, idx, "an integer or opaque pointer"))
}

Comment on lines +119 to +123
fn arg_i32(args: &[AnyView<'_>], spec: TritonCubinFunctionSpec, idx: usize) -> TvmResult<i32> {
args.get(idx)
.and_then(AnyView::try_as::<i32>)
.ok_or_else(|| type_error(spec, idx, "an integer"))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to arg_handle, scalar integer arguments passed from Python/TVM are represented as i64 in the TVM FFI packed call. Calling try_as::<i32>() directly will fail because of strict type-code matching. We should extract the value as i64 first and then cast it to i32.

fn arg_i32(args: &[AnyView<'_>], spec: TritonCubinFunctionSpec, idx: usize) -> TvmResult<i32> {
    let value = args
        .get(idx)
        .ok_or_else(|| type_error(spec, idx, "an integer"))?;
    if let Some(raw) = value.try_as::<i64>() {
        return Ok(raw as i32);
    }
    if let Some(raw) = value.try_as::<i32>() {
        return Ok(raw);
    }
    Err(type_error(spec, idx, "an integer"))
}

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eed6eb7505

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +119 to +122
fn arg_i32(args: &[AnyView<'_>], spec: TritonCubinFunctionSpec, idx: usize) -> TvmResult<i32> {
args.get(idx)
.and_then(AnyView::try_as::<i32>)
.ok_or_else(|| type_error(spec, idx, "an integer"))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Accept TVM i64 integer arguments

This path rejects the normal TVM integer representation for seq_len and num_value_heads: AnyView::try_as::<i32>() is an exact type check rather than a narrowing conversion, while TVM FFI integer values commonly arrive as i64 (including from Python/TVM typed calls). In that environment the new global function returns a type error before launching the CUBIN even for valid dimensions; parse i64/u64 and range-check before casting to i32.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant