Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions tokenspeed-kernel/python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -453,6 +453,30 @@ def _detect_cuda_archs(self):
archs.add(self._normalize_cuda_arch(direct))
return archs

# Try detecting from the NVIDIA driver.
if not archs:
try:
caps = (
subprocess.check_output(
[
"nvidia-smi",
"--query-gpu=compute_cap",
"--format=csv,noheader",
],
text=True,
stderr=subprocess.DEVNULL,
)
.strip()
.splitlines()
)
for cap in caps:
cap = cap.strip()
if cap:
archs.add(self._normalize_cuda_arch(cap + "a"))
Copy link
Copy Markdown
Contributor

@borontion borontion May 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why there is a suffix "a" here - cap + "a". shouldn't it be handled by _normalize_cuda_arch?

Copy link
Copy Markdown
Author

@Dogacel Dogacel May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused, it seems like we can remove.

The goal was to get sm100a, sm90a, sm120a etc. The normalize does,

        suffix = "a" if has_suffix or major >= 9 else ""
        return f"{major}{minor}{suffix}"

So it will always has "a", BUT when I tried to compile tokenspeed on sm80, I never succeeded so I couldn't test.

So just removing it should result in identical code for supported architectures, I'll do that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi we don’t plan to support sm80

except (OSError, subprocess.CalledProcessError):
pass

# Fallback: Blackwell
if not archs:
archs.add("100a")
return archs
Expand Down