feat(kernels): add TVM FFI Triton CUBIN wrapper#315
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces tvm-ffi as a required dependency for pegainfer-kernels and implements a TVM FFI wrapper layer for Triton AOT CUBIN launchers, specifically exposing the Qwen3.5 gated-delta-rule triangular solve launcher. It also adds corresponding documentation, tests, and an example. Feedback on the implementation highlights a critical runtime issue in triton_cubin.rs where integer arguments are extracted directly as u64 or i32 using AnyView::try_as. Because TVM FFI packed integer arguments are always passed as i64, these strict type-matching checks will fail at runtime. The reviewer recommends extracting them as i64 first before casting.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| fn arg_handle(args: &[AnyView<'_>], spec: TritonCubinFunctionSpec, idx: usize) -> TvmResult<usize> { | ||
| let value = args | ||
| .get(idx) | ||
| .ok_or_else(|| type_error(spec, idx, "an integer or opaque pointer"))?; | ||
| if let Some(raw) = value.try_as::<u64>() { | ||
| return Ok(raw as usize); | ||
| } | ||
| if let Some(raw) = value.try_as::<*mut c_void>() { | ||
| return Ok(raw as usize); | ||
| } | ||
| Err(type_error(spec, idx, "an integer or opaque pointer")) | ||
| } |
There was a problem hiding this comment.
In TVM FFI, packed function arguments of integer type are always passed as signed 64-bit integers (i64). Since AnyView::try_as performs strict type-code matching, attempting to extract them using try_as::<u64>() will fail at runtime with a type error. To ensure robustness, we should attempt to extract the value as i64 first before trying u64 or *mut c_void.
fn arg_handle(args: &[AnyView<'_>], spec: TritonCubinFunctionSpec, idx: usize) -> TvmResult<usize> {
let value = args
.get(idx)
.ok_or_else(|| type_error(spec, idx, "an integer or opaque pointer"))?;
if let Some(raw) = value.try_as::<i64>() {
return Ok(raw as usize);
}
if let Some(raw) = value.try_as::<u64>() {
return Ok(raw as usize);
}
if let Some(raw) = value.try_as::<*mut c_void>() {
return Ok(raw as usize);
}
Err(type_error(spec, idx, "an integer or opaque pointer"))
}| fn arg_i32(args: &[AnyView<'_>], spec: TritonCubinFunctionSpec, idx: usize) -> TvmResult<i32> { | ||
| args.get(idx) | ||
| .and_then(AnyView::try_as::<i32>) | ||
| .ok_or_else(|| type_error(spec, idx, "an integer")) | ||
| } |
There was a problem hiding this comment.
Similar to arg_handle, scalar integer arguments passed from Python/TVM are represented as i64 in the TVM FFI packed call. Calling try_as::<i32>() directly will fail because of strict type-code matching. We should extract the value as i64 first and then cast it to i32.
fn arg_i32(args: &[AnyView<'_>], spec: TritonCubinFunctionSpec, idx: usize) -> TvmResult<i32> {
let value = args
.get(idx)
.ok_or_else(|| type_error(spec, idx, "an integer"))?;
if let Some(raw) = value.try_as::<i64>() {
return Ok(raw as i32);
}
if let Some(raw) = value.try_as::<i32>() {
return Ok(raw);
}
Err(type_error(spec, idx, "an integer"))
}There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eed6eb7505
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fn arg_i32(args: &[AnyView<'_>], spec: TritonCubinFunctionSpec, idx: usize) -> TvmResult<i32> { | ||
| args.get(idx) | ||
| .and_then(AnyView::try_as::<i32>) | ||
| .ok_or_else(|| type_error(spec, idx, "an integer")) |
There was a problem hiding this comment.
Accept TVM i64 integer arguments
This path rejects the normal TVM integer representation for seq_len and num_value_heads: AnyView::try_as::<i32>() is an exact type check rather than a narrowing conversion, while TVM FFI integer values commonly arrive as i64 (including from Python/TVM typed calls). In that environment the new global function returns a type error before launching the CUBIN even for valid dimensions; parse i64/u64 and range-check before casting to i32.
Useful? React with 👍 / 👎.
|
Added test coverage in
Validation:
|
xiaguan
left a comment
There was a problem hiding this comment.
Thanks for the patch. I like the direction of exposing these CUBIN launchers through TVM FFI, but I do not think tvm-ffi should become a required dependency of pegainfer-kernels yet.
Please put this behind a feature gate, or move it into a small optional bridge crate like pegainfer-kernels-tvm, so normal kernel builds do not require tvm-ffi-config / libtvm_ffi.
Also, the added tests currently do not compile locally:
PATH=/data/code/workspace-rustllm/pegainfer/.venv/bin:$PATH PEGAINFER_CUDA_SM=120 PEGAINFER_TRITON_PYTHON=/data/code/workspace-rustllm/pegainfer/.venv/bin/python cargo test --release -p pegainfer-kernels triton_cubin --lib -- --nocapture
fails with:
error[E0277]: tvm_ffi::Any doesn't implement Debug
This comes from expect_err(...) on call_packed(...); expect_err requires the success type to implement Debug. Please fix the tests, e.g. by matching on the Result.
Requesting changes until these are addressed.
|
Addressed xiaguan's requested changes in
Validation:
|
|
@codex review |
|
Codex Review: Didn't find any major issues. Chef's kiss. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
rebase or merge master? |
Summary
Adds the MVP production-facing TVM FFI path for Triton AOT CUBIN launchers in pegainfer-kernels.
This makes tvm-ffi a required pegainfer-kernels dependency, exposes a packed TVM FFI global wrapper for the generated Qwen3.5 gated-delta-rule Triton launcher, and adds a small example showing registration/discovery of the wrapper ABI.
Changes
Validation
Related
Related to #191.
This is intentionally narrower than #202: #202 validates optional bidirectional interop, while this PR makes tvm-ffi required for pegainfer-kernels and wires the first Triton CUBIN wrapper surface.