Skip to content

Building on CUDA 12.6 likely has issues on driver versions older than 565 #417

@ehfd

Description

@ehfd

Solution to issue cannot be found in the documentation.

  • I checked the documentation.

Issue

W external/local_xla/xla/service/gpu/nvptx_compiler.cc:930] The NVIDIA driver's CUDA version is 12.4 which is older than the PTX compiler version 12.5.82. Because the driver is older than the PTX compiler version, XLA is disabling parallel compilation, which may slow down compilation. You should update your NVIDIA driver or use the NVIDIA-provided CUDA forward compatibility packages.
I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:557] Omitted potentially buggy algorithm eng14{} for conv (f32[32,128,64,3]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,5,64,3]{3,2,1,0}, f32[128,5,3,3]{3,2,1,0}, f32[128]{0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", backend_config={"cudnn_conv_backend_config":{"activation_mode":"kNone","conv_result_scale":1,"leakyrelu_alpha":0,"side_input_scale":0},"force_earliest_schedule":false,"operation_queue_id":"0","wait_on_operation_queues":[]}

I observe the above issue while using NVIDIA 550.144 drivers together with pip tensorflow[and-cuda]==2.18.0, which was built with and depends on CUDA 12.5 (But not with JAX).

This is an example of the not-satisfying condition specified in the PTX section of https://pypackaging-native.github.io/key-issues/gpus/#additional-notes-on-cuda-compatibility.

(Can't test TensorFlow 2.17.0 on conda-forge because it is built with CUDA 12.0 right now.)

Therefore, it is possible that a build based on CUDA 12.6 has issues. The code is still trainable, but has implications of limitations for XLA.

Please prove otherwise on an older driver version if this issue doesn't exist (somehow) in conda-forge.

This will not likely have issues with JAX and PyTorch, so this is a different issue from conda-forge/pytorch-cpu-feedstock#337 (while the extensive discussion there has helped understand the CUDA dependency landscape).

Installed packages

TensorFlow 2.18.0 is unreleased.

Environment info

TensorFlow 2.18.0 is unreleased.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions