CUDA: support F32 kernel type for `CONV_TRANSPOSE_2D` #17094

AgainstEntropy · 2025-11-08T02:21:47Z

also updated test case in test-backend-ops.
But since F32 kernel type is not supported on CPU, only GGML_TYPE_F16 is kept and GGML_TYPE_F32 can be uncommented back in the future.

…types and improve parameter handling - Introduced a `conv2d_transpose_params` struct for better parameter management. - Updated `conv2d_transpose_kernel` to be templated for different kernel types (float and half). - Modified `ggml_cuda_conv_2d_transpose_p0` to handle both F16 and F32 kernel types. - Enhanced test cases to validate functionality for both kernel types.

…ernel types - Updated `test_conv_transpose_2d` structure to improve parameter handling by reordering constructor arguments. - Enhanced test case generation to iterate over kernel types, allowing for flexible testing of different configurations. - Removed hardcoded kernel type instances in favor of a loop for better maintainability and scalability.

am17an

Does this PR make a difference to something? From what I understand, the kernel value is upcast into float before doing any accumulation (and accumulation is anyway in f32). So unless there are kernels around which don't fit into f16 I don't see a benefit to supporting this, especially when we don't support the f16 inputs yet (which incidentally might be more relevant than kernels being f32 as we could potentially do half2 multiplications)

am17an · 2025-11-12T12:39:38Z

ggml/src/ggml-cuda/conv2d-transpose.cu

-        input_data, kernel_data, output_data, input_w, input_h, output_w, output_h, kernel_w, kernel_h, stride,
-        channels_in, channels_out, batches);
+    if (kernel->type == GGML_TYPE_F16) {
+        conv2d_transpose_cuda_f16(input_data, (const half *) kernel_data, output_data, params, st);


you don't need separate cuda_f16 and cuda_f32 functions here, you can straight away dispatch here to conv2d_transpose_cuda<type> and remove those two functions

am17an · 2025-11-12T12:42:03Z

ggml/src/ggml-cuda/conv2d-transpose.cu

+    const int total;
+};
+
+template <typename T>


probably a better name for this should be kernel_t

AgainstEntropy added 2 commits November 6, 2025 20:47

AgainstEntropy requested a review from slaren as a code owner November 8, 2025 02:21

github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Nov 8, 2025

DajanaV mentioned this pull request Nov 8, 2025

UPSTREAM PR #17094: CUDA: support F32 kernel type for CONV_TRANSPOSE_2D auroralabs-loci/llama.cpp#129

Open

am17an reviewed Nov 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: support F32 kernel type for `CONV_TRANSPOSE_2D` #17094

CUDA: support F32 kernel type for `CONV_TRANSPOSE_2D` #17094

AgainstEntropy commented Nov 8, 2025

Uh oh!

am17an left a comment •

edited

Loading

Uh oh!

am17an Nov 12, 2025

Uh oh!

am17an Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CUDA: support F32 kernel type for CONV_TRANSPOSE_2D #17094

Are you sure you want to change the base?

CUDA: support F32 kernel type for CONV_TRANSPOSE_2D #17094

Conversation

AgainstEntropy commented Nov 8, 2025

Uh oh!

am17an left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

am17an Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

am17an Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CUDA: support F32 kernel type for `CONV_TRANSPOSE_2D` #17094

CUDA: support F32 kernel type for `CONV_TRANSPOSE_2D` #17094

am17an left a comment •

edited

Loading