Add Python frontend by itemkelvin · Pull Request #327 · coredac/neura

itemkelvin · 2026-06-14T07:06:30Z

Adds a Python frontend pipeline under tools/neura-py-frontend/ that lowers PyTorch-exported ML models into Neura dataflow IR. The pipeline traverses Torch → Linalg → Affine → Neura dialect chain and emits IR suitable for CGRA acceleration via the Neura interpreter.

…astructure - Add StripTaskflowTaskPass for taskflow cleanup - Add python2neura conversion pipeline (--python-to-neura) - Add dataflow mode support to neura-interpreter (--dataflow flag) - Add llvm-lit tests for generated model IR (DATAFLOW_IR + INTERPRETER_OUTPUT checks) - Add test_models.py with CF/DF mode support for end-to-end verification - Add verify_models.py for interpreting generated dataflow IR - Add neura-py-frontend tools Note: DF mode in interpreter has a scheduling bug (extra iterations overwrite correct results with zeros). CF mode works correctly.

…iteration limit Core fixes to DF interpreter for numerical correctness: - Add memory dependency edges (RAW/WAW/WAR) in DependencyGraph between load_indexed/store_indexed ops on same memref - Fix resolveKernelBlockArg() to traverse neura.data_mov and neura.constant wrappers for correct memref ID aliasing across kernel boundaries - Increase MAX_DFG_ITERATIONS from 200 to 100000 to handle deep nested loops Add compare_df_numerics.py for automated DF vs PyTorch comparison: - 5/8 models pass with 2e-2 float32 threshold - 3 models (two_layer_mlp, gelu_layernorm) need further fixes

…ata races Root cause: in the original flat DFG, all kernels' counters advanced simultaneously, causing interleaved execution across kernels. A later kernel (e.g. matmul layer 2) could read partially-computed values from an earlier kernel (e.g. matmul layer 1) before it finished all iterations. Fix: execute kernels sequentially in IR order. Each kernel runs its own DF loop to exhaustion before the next kernel starts. This guarantees kernel N+1 sees the fully-computed output of kernel N. All 8 models pass with max_abs_err < 5e-8 (float32 precision): simple_matmul max_abs_err=4.66e-09 residual_block max_abs_err=3.73e-09 residual_block_norelu max_abs_err=4.47e-08 two_layer_mlp max_abs_err=6.52e-09 (was ALL ZERO) two_layer_mlp_norelu max_abs_err=3.73e-09 (was 3.07e-02) conv2d_relu_pool max_abs_err=3.73e-09 (was 1.49e-02) transformer_attention max_abs_err=6.82e-13 gelu_layernorm max_abs_err=4.55e-13 (was 3.64e-02)

- Add neura.exp op (NeuraOps.td) with lowering (math.exp → neura.exp) and interpreter handler (handleExpOp) - ReLU models: verified residual_block/two_layer_mlp pass --neura-conversion (ArithCmpFToNeuraFCmp + ArithSelectToNeuraSel patterns already handle this) - transformer_attention: use real torch.softmax instead of ReLU approximation - gelu_layernorm: add proper GELU activation (tanh approximation via math ops) - Dynamic shapes: implement export path using torch.export.export() with graceful fallback to static shapes - Clean up fcmp/icmp TODO (dead code for non-existent predicate operand) - test_models.py: restore original ReLU models now that they pass All 8 models pass DF numerical comparison (max_abs_err < 1e-5)

- Adapt neura_pipeline.py to torch-mlir 20260531 API - Add ExpandMathToArith pass (fpowi/tanh expansion) - Update interpreter taskflow op handling - Update CMakeLists and pass registrations - Add README and environment.yml for py-frontend - Remove deprecated test files (verify_models.py, compare_df_numerics.py, old environment.yml)

…README with kernel scheduling & dependency docs

tancheng · 2026-06-14T07:21:18Z

Hi @itemkelvin, we are not provide new primitives in python/pytorch, right? We are just lowering pytorch to neura in this PR?

itemkelvin · 2026-06-14T09:08:34Z

We are not introducing any new primitives in Python/PyTorch. This PR is purely a lowering pipeline that takes standard torch.nn models and converts them all the way down to Neura IR. Users write plain PyTorch code — the pipeline handles the rest. Pipeline: PyTorch → torch-mlir → Linalg → Affine → Neura IR + CGRA JSON. The neura_pipeline.py frontend exposes both a CLI and a Python API, supporting stage-by-stage or end-to-end conversion. It shells out to mlir-neura-opt for the MLIR-to-MLIR lowering passes and neura-interpreter for execution. 2 new MLIR passes: expand-math-to-arith — expands math.tanh and math.fpowi into arith/math.exp chains before constants are hoisted into kernel block arguments, so exponents don't get lost during lowering. strip-taskflow-task — strips the taskflow.task wrapper and hoists inner neura.kernel ops to the function level, which is the first step of the final Neura conversion stage. 3 new Neura ops: neura.fcmp, neura.exp, neura.rsqrt. These were needed because the original Neura dialect only had basic arithmetic (add, mul, icmp) and couldn't express the operations required by GELU (needs exp + fcmp) or LayerNorm (needs rsqrt). Interpreter support for all three is added in neura-interpreter.cpp via handleFCmpOp, handleExpOp, and handleRsqrtOp. Tests: 8 PyTorch model tests covering matmul, MLP, ResBlock (with and without ReLU), Conv2D+ReLU+MaxPool, Multi-head Attention, and GELU+LayerNorm. All pass with numerical error ~1e-8 (PyTorch golden output vs. Neura interpreter output). lit/FileCheck-based IR tests are also included. Note: the environment.yml currently pins a different torch-mlir version than what .github/workflows/main.yml uses at runtime. We should align them so that local setup and CI use the same version.  

…

------------------ 原始邮件 ------------------ 发件人: "coredac/neura" ***@***.***>; 发送时间: 2026年6月14日(星期天) 下午3:21 ***@***.***>; ***@***.******@***.***>; 主题: Re: [coredac/neura] Python frontend (PR #327) tancheng left a comment (coredac/neura#327) Hi @itemkelvin, we are not provide new primitives in python/pytorch, right? We are just lowering pytorch to neura in this PR? — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today! You are receiving this because you were mentioned.Message ID: ***@***.***>

…wMode, add Frame ctor

tancheng · 2026-06-15T04:59:11Z

Hi @itemkelvin, I guess the PR is with the help from some LLM? If so, is it convenient for you to split this huge PR into a few small PRs.. It is currently hard for us to review. Can you let LLM to give you a plan about how many small PRs you need to complete this?

Developer added 9 commits June 13, 2026 03:50

chore: clean generated artifacts (Output dirs, .dot, tmp-*, __pycache__)

ac31267

chore: remove __pycache__, build/, lit.cfg

7f2d392

docs: doxygen-style comments for verify_kernel_iterations.py; update …

7b4d17a

…README with kernel scheduling & dependency docs

Python frontend implementation

217bd26

tancheng requested review from BenkangPeng, HobbitQia, ShangkunLi, TimJZ, YanzhouTang and guosran June 14, 2026 07:21

python-frontend implement

75567c7

itemkelvin force-pushed the python-frontend branch from 16835c4 to 75567c7 Compare June 14, 2026 15:27

itemkelvin changed the title ~~Python frontend~~ Add Python frontend Jun 14, 2026

refactor(interpreter): split run() into runDataflowMode/runControlFlo…

aa6eb2a

…wMode, add Frame ctor

tancheng mentioned this pull request Jun 15, 2026

Feature/frontend and passes #328

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Python frontend#327

Add Python frontend#327
itemkelvin wants to merge 11 commits into
mainfrom
python-frontend

itemkelvin commented Jun 14, 2026

Uh oh!

tancheng commented Jun 14, 2026

Uh oh!

itemkelvin commented Jun 14, 2026 via email

Uh oh!

tancheng commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

itemkelvin commented Jun 14, 2026

Uh oh!

tancheng commented Jun 14, 2026

Uh oh!

itemkelvin commented Jun 14, 2026 via email

Uh oh!

tancheng commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants