Add Python frontend#327
Open
itemkelvin wants to merge 11 commits into
Open
Conversation
added 9 commits
June 13, 2026 03:50
…astructure - Add StripTaskflowTaskPass for taskflow cleanup - Add python2neura conversion pipeline (--python-to-neura) - Add dataflow mode support to neura-interpreter (--dataflow flag) - Add llvm-lit tests for generated model IR (DATAFLOW_IR + INTERPRETER_OUTPUT checks) - Add test_models.py with CF/DF mode support for end-to-end verification - Add verify_models.py for interpreting generated dataflow IR - Add neura-py-frontend tools Note: DF mode in interpreter has a scheduling bug (extra iterations overwrite correct results with zeros). CF mode works correctly.
…iteration limit Core fixes to DF interpreter for numerical correctness: - Add memory dependency edges (RAW/WAW/WAR) in DependencyGraph between load_indexed/store_indexed ops on same memref - Fix resolveKernelBlockArg() to traverse neura.data_mov and neura.constant wrappers for correct memref ID aliasing across kernel boundaries - Increase MAX_DFG_ITERATIONS from 200 to 100000 to handle deep nested loops Add compare_df_numerics.py for automated DF vs PyTorch comparison: - 5/8 models pass with 2e-2 float32 threshold - 3 models (two_layer_mlp, gelu_layernorm) need further fixes
…ata races Root cause: in the original flat DFG, all kernels' counters advanced simultaneously, causing interleaved execution across kernels. A later kernel (e.g. matmul layer 2) could read partially-computed values from an earlier kernel (e.g. matmul layer 1) before it finished all iterations. Fix: execute kernels sequentially in IR order. Each kernel runs its own DF loop to exhaustion before the next kernel starts. This guarantees kernel N+1 sees the fully-computed output of kernel N. All 8 models pass with max_abs_err < 5e-8 (float32 precision): simple_matmul max_abs_err=4.66e-09 residual_block max_abs_err=3.73e-09 residual_block_norelu max_abs_err=4.47e-08 two_layer_mlp max_abs_err=6.52e-09 (was ALL ZERO) two_layer_mlp_norelu max_abs_err=3.73e-09 (was 3.07e-02) conv2d_relu_pool max_abs_err=3.73e-09 (was 1.49e-02) transformer_attention max_abs_err=6.82e-13 gelu_layernorm max_abs_err=4.55e-13 (was 3.64e-02)
- Add neura.exp op (NeuraOps.td) with lowering (math.exp → neura.exp) and interpreter handler (handleExpOp) - ReLU models: verified residual_block/two_layer_mlp pass --neura-conversion (ArithCmpFToNeuraFCmp + ArithSelectToNeuraSel patterns already handle this) - transformer_attention: use real torch.softmax instead of ReLU approximation - gelu_layernorm: add proper GELU activation (tanh approximation via math ops) - Dynamic shapes: implement export path using torch.export.export() with graceful fallback to static shapes - Clean up fcmp/icmp TODO (dead code for non-existent predicate operand) - test_models.py: restore original ReLU models now that they pass All 8 models pass DF numerical comparison (max_abs_err < 1e-5)
- Adapt neura_pipeline.py to torch-mlir 20260531 API - Add ExpandMathToArith pass (fpowi/tanh expansion) - Update interpreter taskflow op handling - Update CMakeLists and pass registrations - Add README and environment.yml for py-frontend - Remove deprecated test files (verify_models.py, compare_df_numerics.py, old environment.yml)
…README with kernel scheduling & dependency docs
Contributor
|
Hi @itemkelvin, we are not provide new primitives in python/pytorch, right? We are just lowering pytorch to neura in this PR? |
Collaborator
Author
|
We are not introducing any new primitives in Python/PyTorch. This PR is purely a lowering pipeline that takes standard torch.nn models and converts them all the way down to Neura IR. Users write plain PyTorch code — the pipeline handles the rest.
Pipeline: PyTorch → torch-mlir → Linalg → Affine → Neura IR + CGRA JSON. The neura_pipeline.py frontend exposes both a CLI and a Python API, supporting stage-by-stage or end-to-end conversion. It shells out to mlir-neura-opt for the MLIR-to-MLIR lowering passes and neura-interpreter for execution.
2 new MLIR passes:
expand-math-to-arith — expands math.tanh and math.fpowi into arith/math.exp chains before constants are hoisted into kernel block arguments, so exponents don't get lost during lowering.
strip-taskflow-task — strips the taskflow.task wrapper and hoists inner neura.kernel ops to the function level, which is the first step of the final Neura conversion stage.
3 new Neura ops: neura.fcmp, neura.exp, neura.rsqrt. These were needed because the original Neura dialect only had basic arithmetic (add, mul, icmp) and couldn't express the operations required by GELU (needs exp + fcmp) or LayerNorm (needs rsqrt). Interpreter support for all three is added in neura-interpreter.cpp via handleFCmpOp, handleExpOp, and handleRsqrtOp.
Tests: 8 PyTorch model tests covering matmul, MLP, ResBlock (with and without ReLU), Conv2D+ReLU+MaxPool, Multi-head Attention, and GELU+LayerNorm. All pass with numerical error ~1e-8 (PyTorch golden output vs. Neura interpreter output). lit/FileCheck-based IR tests are also included.
Note: the environment.yml currently pins a different torch-mlir version than what .github/workflows/main.yml uses at runtime. We should align them so that local setup and CI use the same version.
…------------------ 原始邮件 ------------------
发件人: "coredac/neura" ***@***.***>;
发送时间: 2026年6月14日(星期天) 下午3:21
***@***.***>;
***@***.******@***.***>;
主题: Re: [coredac/neura] Python frontend (PR #327)
tancheng left a comment (coredac/neura#327)
Hi @itemkelvin, we are not provide new primitives in python/pytorch, right? We are just lowering pytorch to neura in this PR?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
16835c4 to
75567c7
Compare
…wMode, add Frame ctor
Contributor
|
Hi @itemkelvin, I guess the PR is with the help from some LLM? If so, is it convenient for you to split this huge PR into a few small PRs.. It is currently hard for us to review. Can you let LLM to give you a plan about how many small PRs you need to complete this? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a Python frontend pipeline under tools/neura-py-frontend/ that lowers PyTorch-exported ML models into Neura dataflow IR. The pipeline traverses Torch → Linalg → Affine → Neura dialect chain and emits IR suitable for CGRA acceleration via the Neura interpreter.