fix: detect GPU arch automatically for kernel building by Dogacel · Pull Request #169 · lightseekorg/tokenspeed

Dogacel · 2026-05-17T05:42:58Z

Summary

Detect GPU architecture automatically while building tokenspeed-kernel.

Test Plan

(Old flow) Built kernel on GH200, however sm100a is not comptible by default, so I get [ts] RuntimeError: Check failed: (status == cudaSuccess) is false: BatchQKApplyRotaryPosIdsCosSinCacheEnhanced failed with error code no kernel image is available for execution on the device while running gpt-oss-20b.
(New flow) Clear packages, cache and re-run pip install -e tokenspeed-kernel/python/ --no-build-isolation. The GPU architecture is automatically detected without setting FLASHINFER_CUDA_ARCH_LIST or TOKENSPEED_CUDA_ARCH (which also fixes the issue).

Signed-off-by: Doğaç Eldenk <dogacel@gmail.com>

borontion · 2026-05-17T18:27:48Z

+                for cap in caps:
+                    cap = cap.strip()
+                    if cap:
+                        archs.add(self._normalize_cuda_arch(cap + "a"))


why there is a suffix "a" here - cap + "a". shouldn't it be handled by _normalize_cuda_arch?

I am a bit confused, it seems like we can remove.

The goal was to get sm100a, sm90a, sm120a etc. The normalize does,

suffix = "a" if has_suffix or major >= 9 else "" return f"{major}{minor}{suffix}"

So it will always has "a", BUT when I tried to compile tokenspeed on sm80, I never succeeded so I couldn't test.

So just removing it should result in identical code for supported architectures, I'll do that.

Hi we don’t plan to support sm80

Signed-off-by: Doğaç Eldenk <dogacel@gmail.com>

Dogacel requested a review from a team as a code owner May 17, 2026 05:42

fix: detect GPU arch automatically for kernel building

b738115

Signed-off-by: Doğaç Eldenk <dogacel@gmail.com>

Dogacel force-pushed the default-kernel-arch branch from cc66eee to b738115 Compare May 17, 2026 06:15

borontion reviewed May 17, 2026

View reviewed changes

Dogacel requested a review from borontion May 18, 2026 02:00

remove "a" from sm

9044eb1

Signed-off-by: Doğaç Eldenk <dogacel@gmail.com>

Dogacel force-pushed the default-kernel-arch branch from aa04d70 to 9044eb1 Compare May 18, 2026 14:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: detect GPU arch automatically for kernel building#169

fix: detect GPU arch automatically for kernel building#169
Dogacel wants to merge 2 commits into
lightseekorg:mainfrom
Dogacel:default-kernel-arch

Dogacel commented May 17, 2026

Uh oh!

borontion May 17, 2026 •

edited

Loading

Uh oh!

Dogacel May 18, 2026 •

edited

Loading

Uh oh!

lightseek-bot May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Dogacel commented May 17, 2026

Summary

Test Plan

Uh oh!

borontion May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dogacel May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lightseek-bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

borontion May 17, 2026 •

edited

Loading

Dogacel May 18, 2026 •

edited

Loading