-
Notifications
You must be signed in to change notification settings - Fork 732
Enable 16-bit activations and 8-bit weights in Cadence Quantizer for Conv #15928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -9,6 +9,7 @@ | |||||||
| #include <executorch/backends/cadence/hifi/kernels/kernels.h> | ||||||||
| #include <executorch/backends/cadence/hifi/operators/operators.h> | ||||||||
| #include <executorch/runtime/kernel/kernel_includes.h> | ||||||||
| #include <on_device_ai/Assistant/Jarvis/min_runtime/operators/generic/operators.h> | ||||||||
| #include <xa_nnlib_kernels_api.h> | ||||||||
| #include <xtensa/tie/xt_datacache.h> | ||||||||
| #include <algorithm> | ||||||||
|
|
@@ -207,7 +208,7 @@ void inline _quantized_linear_per_tensor_asym8s( | |||||||
| } | ||||||||
|
|
||||||||
| void quantized_linear_out( | ||||||||
| __ET_UNUSED KernelRuntimeContext& ctx, | ||||||||
| KernelRuntimeContext& ctx, | ||||||||
| const Tensor& in, | ||||||||
| const Tensor& weight, | ||||||||
| const Tensor& bias, | ||||||||
|
|
@@ -216,9 +217,26 @@ void quantized_linear_out( | |||||||
| const Tensor& out_multiplier, | ||||||||
| const Tensor& out_shift, | ||||||||
| int64_t out_zero_point, | ||||||||
| __ET_UNUSED const optional<Tensor>& offset, | ||||||||
| const optional<Tensor>& offset, | ||||||||
| Tensor& out) { | ||||||||
| if (out.scalar_type() == executorch::aten::ScalarType::Byte) { | ||||||||
| if (out.scalar_type() == ::executorch::aten::ScalarType::Short && | ||||||||
| in.scalar_type() == ::executorch::aten::ScalarType::Short && | ||||||||
| weight.scalar_type() == ::executorch::aten::ScalarType::Char) { | ||||||||
| ::impl::generic::native::quantized_linear_out( | ||||||||
| ctx, | ||||||||
| in, | ||||||||
| weight, | ||||||||
| bias, | ||||||||
| in_zero_point, | ||||||||
| weight_zero_point, | ||||||||
| out_multiplier, | ||||||||
| out_shift, | ||||||||
| out_zero_point, | ||||||||
| offset, | ||||||||
| out); | ||||||||
| } | ||||||||
|
||||||||
| } | |
| return; |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment before this if-block to explain the purpose, similar to the conv2d implementations. For example: // Handle W8A16 heterogeneous type (int16_t activations, int8_t weights). This improves code consistency and readability.
| Tensor& out) { | |
| Tensor& out) { | |
| // Handle W8A16 heterogeneous type (int16_t activations, int8_t weights) |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The __ET_UNUSED annotations on the ctx parameter (line 270) and offset parameter (line 279) are now incorrect since both are used in the int16 case (lines 285 and 294). Remove the __ET_UNUSED annotations from these parameters.
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing return statement after handling the int16 case. Without a return, execution will fall through to the subsequent else if checks, potentially executing the wrong code path or triggering an incorrect error message. Add return; after line 295.
| } | |
| return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment before this if-block to explain the purpose, similar to the conv2d implementations. For example:
// Handle W8A16 heterogeneous type (int16_t activations, int8_t weights). This improves code consistency and readability.