Skip to content

Cortex-M: Thread target CPU/ISA through the AOT pass manager#19470

Open
rascani wants to merge 5 commits into
pytorch:mainfrom
rascani:cortex-m-compile-config
Open

Cortex-M: Thread target CPU/ISA through the AOT pass manager#19470
rascani wants to merge 5 commits into
pytorch:mainfrom
rascani:cortex-m-compile-config

Conversation

@rascani
Copy link
Copy Markdown
Contributor

@rascani rascani commented May 11, 2026

Summary

Introduce a CortexMCompileConfig dataclass (cpu + isa) that carries Cortex-M target information from the --target=cortex-m+int8 CLI string into CortexMPassManager. The full standard Cortex-M lineup is registered (M0, M0+, M3, M4, M7, M23, M33, M35P, M52, M55, M85), each with a sensible default ISA; the optional-DSP M33/M35P and optional-MVE M52/M55/M85 cases can be expressed via the isa= kwarg.

No pass reads the config yet, so this change is purely plumbing, but it positions both the upcoming AOT scratch-buffer sizing work (#16580) and the CMSIS-NN scalar (#17646) / DSP (#17644) backend support to plug in without re-plumbing the call site. Actually building for the new variants still requires

CortexMTester gains an optional config kwarg, and the Pico 2 MLP example now constructs CortexMPassManager with cpu='cortex-m33' to match the RP2350 hardware it targets.

Authored with Claude.

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell

Introduce a CortexMCompileConfig dataclass (cpu + isa) that carries
Cortex-M target information from the --target=cortex-m<variant>+int8
CLI string into CortexMPassManager. The full standard Cortex-M lineup
is registered (M0, M0+, M3, M4, M7, M23, M33, M35P, M52, M55, M85),
each with a sensible default ISA; the optional-DSP M33/M35P and
optional-MVE M52/M55/M85 cases can be expressed via the isa= kwarg.

No pass reads the config yet, so this change is purely plumbing — but
it positions both the upcoming AOT scratch-buffer sizing work (pytorch#16580)
and the M0+ (pytorch#17646) / M33 (pytorch#17644) backend support to plug in without
re-plumbing the call site. Actually building for the new variants
still requires Phase 2's MPS2 platform glue.

CortexMTester gains an optional config kwarg, and the Pico 2 MLP
example now constructs CortexMPassManager with cpu='cortex-m33' to
match the RP2350 hardware it targets.

Authored with Claude.
@rascani rascani requested a review from digantdesai as a code owner May 11, 2026 22:12
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 11, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19470

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 11, 2026
@github-actions github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels May 11, 2026
@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

"TOSA-1.0+INT",
"TOSA-1.0+FP",
"TOSA-1.0+INT+int16",
"cortex-m0+int8",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sweet

@zingo
Copy link
Copy Markdown
Collaborator

zingo commented May 12, 2026

Really nice!

Comment thread backends/cortex_m/compile_config.py Outdated
# Default ISA per CPU follows the most common configuration each core is
# shipped with. M33/M35P optionally lack DSP, and M52/M55/M85 optionally
# lack MVE; callers can pass `isa=` explicitly to override.
_CPU_DEFAULT_ISA: dict[str, str] = {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the Cpu, Isa literals you defined for typing?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to Enum

Comment thread backends/cortex_m/compile_config.py Outdated
"""

cpu: Cpu = "cortex-m55"
isa: Isa | None = None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use dataclass default/ default_factory instead of | None and post_init. Avoids annoying typing issues

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"TOSA-1.0+INT+int16",
"cortex-m0+int8",
"cortex-m0plus+int8",
"cortex-m3+int8",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think +int8 is a good model for a particular target. int8 is a result of quantization and cmsis_nn support, not target. If we suddenly start supporting for example int16 for some cortex-m, all such cortex-m targets will then support int16. I know that we do something like this for TOSA with +int16, but I at least don't think int8 should make it into the CortexMCompileConfig. It can be used in aot_arm_compiler for deciding quantization.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I've dropped the +int8 on all cortex-m. It'll really just explode this list.

"""Cortex-M/CMSIS-NN compilation path with no delegation."""
logging.info("Using Cortex-M/CMSIS-NN compilation path (no delegation)")
logging.info(
"Using Cortex-M/CMSIS-NN compilation path for cpu=%s isa=%s",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: why not use f-strings

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

self.passes: list[PassClass] = ( # type: ignore[assignment]
passes if passes is not None else self.pass_list # type: ignore[assignment]
)
self.config: CortexMCompileConfig = config or CortexMCompileConfig()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docstr to init about the default value of config

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


from executorch.backends.cortex_m.compile_config import CortexMCompileConfig

_HAS_CMSIS_NN = find_spec("cmsis_nn") is not None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytest has a function called import_or_skip or similar

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think there is a danger in skipping tests when a dependency is missing. It could easily lead to something not being tested when we think it is.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread backends/cortex_m/compile_config.py Outdated
from dataclasses import dataclass
from typing import Literal

Cpu = Literal[
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Literals are fine I guess, but why not use Enums for minimal risk of typos? It is easy to go from string to enum in cases where you might need it (aot_arm_compiler)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread backends/cortex_m/compile_config.py Outdated
# Default ISA per CPU follows the most common configuration each core is
# shipped with. M33/M35P optionally lack DSP, and M52/M55/M85 optionally
# lack MVE; callers can pass `isa=` explicitly to override.
_CPU_DEFAULT_ISA: dict[str, str] = {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmsis_nn has a function 'resolve_backend' https://github.com/ARM-software/CMSIS-NN/blob/main/Source/Bindings/arm_py_backend.cpp that should be used for default Isa.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Copy Markdown
Collaborator

@Erik-Lundell Erik-Lundell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I was actually just working on something like this, there are patches on the way for aot memory planning: #19505

The main differences that I would like to see are:

  • Enums instead of literals
  • Involve cmsis_nn, for example have a .backend function that can be passed to cmsis_nn's buffer size calculationen functions. See my PR.
  • Possibly: An "any" option that could be overly pessimistic but always works.

Aligns CortexMTargetConfig with the design Erik proposes in pytorch#19505
while keeping the wider plumbing in place. The earlier
CortexMCompileConfig is renamed to CortexMTargetConfig (and its file
moved to target_config.py) to disambiguate from EdgeCompileConfig —
this dataclass models a compilation *target*, not a step in the
compile pipeline.

Adopted from Erik's feedback:

* CortexM enum replaces the Cpu/Isa Literals — typo-safe and
  IDE-friendly.
* `.backend` property returns `cmsis_nn.Backend` directly, resolved
  via `cmsis_nn.resolve_backend(cmsis_nn.CortexM.<X>)`. The
  hand-rolled `_CPU_DEFAULT_ISA` dict is gone — cmsis_nn is the
  single source of truth for the CPU → backend mapping.
* CortexMPass abstract base class added; CortexMPassManager.transform()
  uses signature inspection to inject both `exported_program` and
  `target_config` into passes that declare them (mirroring Erik's
  proposal). The pass manager also gains stricter validation — the
  exported_program must be a real ExportedProgram and the pass list
  must contain classes, not instances — failing fast instead of
  producing opaque errors deep in _transform.
* cmsis_nn is now a hard dependency for the cortex_m tests: the
  top-level `import cmsis_nn` in test_target_config.py replaces the
  previous skipif-on-find_spec dance, addressing Erik's concern that
  skipping tests on missing deps can mask regressions.
* `+int8` dropped from cortex-m target strings — quantization is a
  result of the export flow, not a CPU attribute. TARGETS, help text,
  from_target_string, CI script and README aligned.
* Logging in `_to_edge_cortex_m` and the --delegate-ignored warning
  switched to f-strings.
* `__init__` docstring on CortexMPassManager documents the
  exported_program / passes / target_config defaults (including the
  M55+MVE fallback that matches pre-config behaviour).
* `import-not-found` removed from the cmsis_nn type-ignore — only
  `import-untyped` actually fires, and if cmsis_nn ever ships stubs
  the unused ignore will become a tripwire.

Kept the optional `isa` override field for the optional-extension
cases (M55 without MVE, M33 without DSP, etc.) — different from
Erik's enum-only design, but the override remains useful for cores
where ISA extensions are optional. A `_SUPPORTED_BACKENDS` table
encodes the per-CPU architectural capability set so overrides
validate at construction; forcing MVE on an M0 raises ValueError
with the actual supported list. The SCALAR ⊂ DSP ⊂ MVE supersession
reflects that an MVE-capable core also runs DSP and scalar code.

Defers Erik's `ANY` proposal. In pytorch#19505 ANY falls back to MVE, but an
honest "any cortex-m" choice would have to do worst-case scratch
buffer planning across the ISA classes (which may not be MVE).
Deferring until the scratch-buffer side lands and we can implement
the worst-case analysis properly.

Authored with Claude.
@rascani
Copy link
Copy Markdown
Contributor Author

rascani commented May 13, 2026

Hi! I was actually just working on something like this, there are patches on the way for aot memory planning: #19505

The main differences that I would like to see are:

  • Enums instead of literals
  • Involve cmsis_nn, for example have a .backend function that can be passed to cmsis_nn's buffer size calculationen functions. See my PR.
  • Possibly: An "any" option that could be overly pessimistic but always works.

Thanks for the feedback and so sorry we wound up duplicating effort. I've incorporated most of the feedback, but one of the things I punted on was an "Any" option. I think that's a valid use case where I want to generate a PTE that is compatible with any cortex-m target, but I want to think through how we'll do the scratch buffer allocation for it. One approach would be to continue to use temp allocator those. Another would be to memory plan for the worst case scratch buffer allocation. We'd probably need to query all of the ISA backends and pick the worst one.

CI's mypy env doesn't pip-install cmsis_nn (it's a native pybind11
module — type-checking-only envs typically skip such deps), so
`import-not-found` fires there even though `import-untyped` is the
one that fires locally. The combo `[import-not-found, import-untyped]`
is honest about both states; dropping either half breaks one or the
other environment.

Authored with Claude.
Comment thread backends/cortex_m/test/tester.py
@Erik-Lundell
Copy link
Copy Markdown
Collaborator

Hi! I was actually just working on something like this, there are patches on the way for aot memory planning: #19505
The main differences that I would like to see are:

  • Enums instead of literals
  • Involve cmsis_nn, for example have a .backend function that can be passed to cmsis_nn's buffer size calculationen functions. See my PR.
  • Possibly: An "any" option that could be overly pessimistic but always works.

Thanks for the feedback and so sorry we wound up duplicating effort. I've incorporated most of the feedback, but one of the things I punted on was an "Any" option. I think that's a valid use case where I want to generate a PTE that is compatible with any cortex-m target, but I want to think through how we'll do the scratch buffer allocation for it. One approach would be to continue to use temp allocator those. Another would be to memory plan for the worst case scratch buffer allocation. We'd probably need to query all of the ISA backends and pick the worst one.

Thanks! Looks good from my side now. I agree on the "any" option, we can save that for another time.

The Zephyr `hello-executorch` README has `<!-- RUN -->` directives that
the test-arm-backend-zephyr CI extracts and executes verbatim. The
prior wording used `--target=cortex-m55+int8`, which the AOT compiler
no longer accepts after the +int8 drop. Update both code blocks (and
the surrounding prose) to use the new bare `--target=cortex-m55`
spelling.

Authored with Claude.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: arm Issues related to arm backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants