Skip to content

Add CPU simulation support#156

Open
vloncar wants to merge 5 commits into
huawei-csl:mainfrom
vloncar:cpu_mode
Open

Add CPU simulation support#156
vloncar wants to merge 5 commits into
huawei-csl:mainfrom
vloncar:cpu_mode

Conversation

@vloncar
Copy link
Copy Markdown
Collaborator

@vloncar vloncar commented May 8, 2026

As the title says, this adds support for compiling the torch kernels in CPU sim mode. It basically does what the convoluted cmake would do via direct call to torch compiling extensions. It is much faster to compile and supports incremental compilation. The loaded module can be used in place of the original pto_kernels even if the NPU wheel has been installed.

Most of the kernels were ported, with the few caveats encountered:

  • cpu and a2a3 (and a5) are different implementations of PTO spec so they behave differently. Some constraints are enforced in one but relaxed in other. For example, TSTORE where src::dtype != dst::dtype works on the a2a3 but not on cpu. Docxs imply this should be enforced, yet the implementation allows it. This is used by the tri_inv_rec_unroll and scan_ul1. TGATHER implementation differs too.
  • CCE calls are not all part of the cpu_stub.hpp that PTO-ISA provides, we should avoid them and macros defined by bisheng during compilation (e.g, check if arch is 220 etc)
  • Cross-core sync probably doesn't work, though it is possible to implement. It would likely require some upstream changes too
  • Mixed kernels are not supported. This one too can be supported with a bit lot more code gymnastics upstream.

@zouzias zouzias mentioned this pull request May 12, 2026
1 task
@zouzias
Copy link
Copy Markdown
Collaborator

zouzias commented May 12, 2026

@vloncar

Just FYI, I integrated your TGATHER fix into #64

@vloncar vloncar mentioned this pull request May 12, 2026
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants