Issue: undefined symbol:_ZN3c104cuda29c10_cuda_check_implementationEiPKcS2_ib
Environment
- OS: Ubuntu 22.04.5 LTS (glibc 2.35)
- CUDA: 12.8 (Driver 595.45.04)
- GPUs: 8x NVIDIA B300 SXM6 AC (275GB each)
- Python: 3.11.15
- Model: DeepSeek-V3.2 (671B parameters)
Problem Description
When running the generate script in a Docker environment, I encounter an error loading the libtilert.so library due to an undefined symbol related to PyTorch's CUDA checks.
Steps to Reproduce
pip install tilert==0.1.3
python -c "import tilert" # Fails with symbol errors
Tested PyTorch Versions
| PyTorch Version |
CUDA |
Error |
| 2.9.1 |
cu121 |
undefined symbol: ncclCommWindowDeregister |
| 2.10.0 |
cu128 |
undefined symbol: c10_cuda_check_implementation |
| 2.11.0 |
cu130 |
undefined symbol: ncclCommWindowDeregister |
| 2.6.0 |
cu124 |
undefined symbol: _ZN3c108ListType3get... |
| 2.5.1 |
cu121 |
undefined symbol: _ZN3c108ListType3get... |
| 2.4.1 |
cu121 |
undefined symbol: _ZN3c108ListType3get... |
| 2.3.0 |
cu121 |
undefined symbol: _ZN3c105Error4whatEv |
Complete Error Logs
PyTorch 2.6.0 + tilert 0.1.3
OSError: /usr/local/lib/python3.11/dist-packages/tilert/libtilert.so:
undefined symbol: _ZN3c108ListType3getERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4Type24SingletonOrSharedTypePtrIS9_EE
PyTorch 2.10.0 + tilert 0.1.3
OSError: /usr/local/lib/python3.11/dist-packages/tilert/libtilert.so:
undefined symbol: _ZN3c104cuda29c10_cuda_check_implementationEiPKcS2_ib
PyTorch 2.9.1 + tilert 0.1.3
ImportError: /usr/local/lib/python3.11/dist-packages/torch/lib/libtorch_cuda.so:
undefined symbol: ncclCommWindowDeregister
Attempted Solutions
1. Manual Installation in Ubuntu 22.04
- Created custom container with Ubuntu 22.04 (glibc 2.35)
- Tested 7+ PyTorch versions, all failed with different symbol errors
- tilert 0.1.3 imports successfully only with specific PyTorch versions, but model conversion fails
Issue: undefined symbol:_ZN3c104cuda29c10_cuda_check_implementationEiPKcS2_ib
Environment
Problem Description
When running the generate script in a Docker environment, I encounter an error loading the libtilert.so library due to an undefined symbol related to PyTorch's CUDA checks.
Steps to Reproduce
Tested PyTorch Versions
undefined symbol: ncclCommWindowDeregisterundefined symbol: c10_cuda_check_implementationundefined symbol: ncclCommWindowDeregisterundefined symbol: _ZN3c108ListType3get...undefined symbol: _ZN3c108ListType3get...undefined symbol: _ZN3c108ListType3get...undefined symbol: _ZN3c105Error4whatEvComplete Error Logs
PyTorch 2.6.0 + tilert 0.1.3
PyTorch 2.10.0 + tilert 0.1.3
PyTorch 2.9.1 + tilert 0.1.3
Attempted Solutions
1. Manual Installation in Ubuntu 22.04