Multi‑GPU Triton Kernel Device Binding Issue in ptm‑mamba

Hi @pengzhangzhi ,

Thank you for the excellent work on PTM‑Mamba! I’ve been reproducing and using the model in a multi‑GPU setting and ran into an issue that may affect any multi‑GPU workflow.
When running inference on a non‑default GPU (e.g. cuda:1, cuda:2), I encounter the following Triton error:

> File "/usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py", line 81, in kernel_call
>     self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages,
>   File "<string>", line 78, in _layer_norm_fwd_1pass_kernel
> ValueError: Pointer argument (at 7) cannot be accessed from Triton (cpu tensor?)

Triton JIT‑compiles its kernels to the current default CUDA device (i.e. torch.cuda.current_device()) at the time of first import or first call. If you later switch to another GPU (e.g. via torch.cuda.set_device(1) or model.to('cuda:1')), Triton will still try to run the previously‑compiled kernel on the new device. The pointer then points to memory on a different GPU, triggering the permission error.
 ## to fix
A short‐term workaround is to delay or rebind the Triton compilation to the active device by adding a reference_compile flag in the from_pretrained() method. For example, in
ptm-mamba/protein_lm/modeling/models/mamba/lm.py around line 418, change:
`def from_pretrained(cls,
                     pretrained_model_name,
                     device=None,
                    dtype=None,
                     reference_compile=False,  # new flag
                    **kwargs):`
Then, when calling from_pretrained(), pass reference_compile=False to force Triton to compile for the currently set device.
Thanks again for this great library—hope this helps improve the multi‑GPU experience!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi‑GPU Triton Kernel Device Binding Issue in ptm‑mamba #6

to fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Multi‑GPU Triton Kernel Device Binding Issue in ptm‑mamba #6

Description

to fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions