Commit 8cc5dcf
Cortex-M backend: address review feedback on quantized_activation
Adrian's three review comments on #19792, plus SIMD acceleration of the
LUT lookup (his comment asked for vector intrinsics and loop unrolling):
* Drop the target -> string indirection in the activation lowering.
`passes_utils._ACTIVATION_FNS` now keys directly on the edge op target
(`exir_ops.edge.aten.{sigmoid,tanh,silu}.default`), and
`ConvertToCortexMPass._get_activation_replacement` passes `node.target`
straight into `build_activation_lut` -- no `_ACTIVATION_KINDS` dict and no
string round-trip.
* Replace the scalar LUT-lookup loop with three compile-gated paths:
- M55/M85 (MVE): 16 lanes per iteration -- `vldrbq_u8` load, `vaddq_n_u8`
to bias by 128, `vldrbq_gather_offset_s8` to gather the LUT result,
`vstrbq_s8` to store.
- M4/M7 (DSP, no MVE): 4 bytes per iteration -- fold four byte-loads into
one word-load, batch the +128 bias with `__uadd8`, four LUT lookups
(no M-class gather instruction exists), fold four byte-stores into one
word-store. Uses `<arm_acle.h>` and local memcpy helpers rather than
pulling in the heavyweight `arm_nnsupportfunctions.h`.
- All other cores (M0+/M3): a 4x-unrolled scalar tail, which also handles
the sub-vector remainder of the two SIMD paths.
* Switch the source header to Meta's standard copyright block to match
the other cortex_m op files.
The three paths were cross-compiled for cortex-m0plus / m4 / m7 / m55;
the M4 build emits `uadd8` and the M55 build emits the MVE gather. Runtime
correctness on M4/M7 hardware/FVP is not yet exercised by CI -- the host
unit tests cover the scalar path only.
Co-authored-by: Claude <noreply@anthropic.com>1 parent 82d9c15 commit 8cc5dcf
3 files changed
Lines changed: 95 additions & 21 deletions
File tree
- backends/cortex_m
- ops
- passes
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
| 2 | + | |
| 3 | + | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
9 | 10 | | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
10 | 23 | | |
11 | 24 | | |
12 | 25 | | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
13 | 44 | | |
14 | 45 | | |
15 | 46 | | |
| |||
37 | 68 | | |
38 | 69 | | |
39 | 70 | | |
40 | | - | |
41 | | - | |
| 71 | + | |
42 | 72 | | |
43 | | - | |
| 73 | + | |
| 74 | + | |
44 | 75 | | |
45 | | - | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
46 | 124 | | |
47 | 125 | | |
48 | 126 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
486 | 486 | | |
487 | 487 | | |
488 | 488 | | |
489 | | - | |
490 | | - | |
491 | | - | |
492 | | - | |
493 | | - | |
494 | | - | |
495 | 489 | | |
496 | 490 | | |
497 | 491 | | |
| |||
500 | 494 | | |
501 | 495 | | |
502 | 496 | | |
503 | | - | |
504 | 497 | | |
505 | | - | |
| 498 | + | |
506 | 499 | | |
507 | 500 | | |
508 | 501 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
205 | 205 | | |
206 | 206 | | |
207 | 207 | | |
208 | | - | |
209 | | - | |
210 | | - | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
211 | 211 | | |
212 | 212 | | |
213 | 213 | | |
| |||
220 | 220 | | |
221 | 221 | | |
222 | 222 | | |
223 | | - | |
| 223 | + | |
224 | 224 | | |
225 | 225 | | |
226 | 226 | | |
227 | 227 | | |
228 | 228 | | |
229 | 229 | | |
230 | 230 | | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
231 | 234 | | |
232 | 235 | | |
233 | 236 | | |
234 | 237 | | |
235 | 238 | | |
236 | | - | |
| 239 | + | |
237 | 240 | | |
238 | | - | |
239 | | - | |
| 241 | + | |
| 242 | + | |
240 | 243 | | |
241 | | - | |
| 244 | + | |
242 | 245 | | |
243 | 246 | | |
244 | 247 | | |
| |||
0 commit comments