-
Notifications
You must be signed in to change notification settings - Fork 13.7k
CUDA: support F32 kernel type for CONV_TRANSPOSE_2D
#17094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…types and improve parameter handling - Introduced a `conv2d_transpose_params` struct for better parameter management. - Updated `conv2d_transpose_kernel` to be templated for different kernel types (float and half). - Modified `ggml_cuda_conv_2d_transpose_p0` to handle both F16 and F32 kernel types. - Enhanced test cases to validate functionality for both kernel types.
…ernel types - Updated `test_conv_transpose_2d` structure to improve parameter handling by reordering constructor arguments. - Enhanced test case generation to iterate over kernel types, allowing for flexible testing of different configurations. - Removed hardcoded kernel type instances in favor of a loop for better maintainability and scalability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this PR make a difference to something? From what I understand, the kernel value is upcast into float before doing any accumulation (and accumulation is anyway in f32). So unless there are kernels around which don't fit into f16 I don't see a benefit to supporting this, especially when we don't support the f16 inputs yet (which incidentally might be more relevant than kernels being f32 as we could potentially do half2 multiplications)
| input_data, kernel_data, output_data, input_w, input_h, output_w, output_h, kernel_w, kernel_h, stride, | ||
| channels_in, channels_out, batches); | ||
| if (kernel->type == GGML_TYPE_F16) { | ||
| conv2d_transpose_cuda_f16(input_data, (const half *) kernel_data, output_data, params, st); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need separate cuda_f16 and cuda_f32 functions here, you can straight away dispatch here to conv2d_transpose_cuda<type> and remove those two functions
| const int total; | ||
| }; | ||
|
|
||
| template <typename T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably a better name for this should be kernel_t
also updated test case in
test-backend-ops.But since F32 kernel type is not supported on CPU, only
GGML_TYPE_F16is kept andGGML_TYPE_F32can be uncommented back in the future.