Skip to content

Add Python frontend#327

Open
itemkelvin wants to merge 11 commits into
mainfrom
python-frontend
Open

Add Python frontend#327
itemkelvin wants to merge 11 commits into
mainfrom
python-frontend

Conversation

@itemkelvin

Copy link
Copy Markdown
Collaborator

Adds a Python frontend pipeline under tools/neura-py-frontend/ that lowers PyTorch-exported ML models into Neura dataflow IR. The pipeline traverses Torch → Linalg → Affine → Neura dialect chain and emits IR suitable for CGRA acceleration via the Neura interpreter.

Developer added 9 commits June 13, 2026 03:50
…astructure

- Add StripTaskflowTaskPass for taskflow cleanup
- Add python2neura conversion pipeline (--python-to-neura)
- Add dataflow mode support to neura-interpreter (--dataflow flag)
- Add llvm-lit tests for generated model IR (DATAFLOW_IR + INTERPRETER_OUTPUT checks)
- Add test_models.py with CF/DF mode support for end-to-end verification
- Add verify_models.py for interpreting generated dataflow IR
- Add neura-py-frontend tools

Note: DF mode in interpreter has a scheduling bug (extra iterations
overwrite correct results with zeros). CF mode works correctly.
…iteration limit

Core fixes to DF interpreter for numerical correctness:
- Add memory dependency edges (RAW/WAW/WAR) in DependencyGraph
  between load_indexed/store_indexed ops on same memref
- Fix resolveKernelBlockArg() to traverse neura.data_mov and
  neura.constant wrappers for correct memref ID aliasing
  across kernel boundaries
- Increase MAX_DFG_ITERATIONS from 200 to 100000 to handle
  deep nested loops

Add compare_df_numerics.py for automated DF vs PyTorch comparison:
- 5/8 models pass with 2e-2 float32 threshold
- 3 models (two_layer_mlp, gelu_layernorm) need further fixes
…ata races

Root cause: in the original flat DFG, all kernels' counters advanced
simultaneously, causing interleaved execution across kernels.  A later
kernel (e.g. matmul layer 2) could read partially-computed values from
an earlier kernel (e.g. matmul layer 1) before it finished all
iterations.

Fix: execute kernels sequentially in IR order.  Each kernel runs its
own DF loop to exhaustion before the next kernel starts.  This
guarantees kernel N+1 sees the fully-computed output of kernel N.

All 8 models pass with max_abs_err < 5e-8 (float32 precision):
  simple_matmul          max_abs_err=4.66e-09
  residual_block         max_abs_err=3.73e-09
  residual_block_norelu  max_abs_err=4.47e-08
  two_layer_mlp          max_abs_err=6.52e-09 (was ALL ZERO)
  two_layer_mlp_norelu   max_abs_err=3.73e-09 (was 3.07e-02)
  conv2d_relu_pool       max_abs_err=3.73e-09 (was 1.49e-02)
  transformer_attention  max_abs_err=6.82e-13
  gelu_layernorm         max_abs_err=4.55e-13 (was 3.64e-02)
- Add neura.exp op (NeuraOps.td) with lowering (math.exp → neura.exp)
  and interpreter handler (handleExpOp)
- ReLU models: verified residual_block/two_layer_mlp pass --neura-conversion
  (ArithCmpFToNeuraFCmp + ArithSelectToNeuraSel patterns already handle this)
- transformer_attention: use real torch.softmax instead of ReLU approximation
- gelu_layernorm: add proper GELU activation (tanh approximation via math ops)
- Dynamic shapes: implement export path using torch.export.export() with
  graceful fallback to static shapes
- Clean up fcmp/icmp TODO (dead code for non-existent predicate operand)
- test_models.py: restore original ReLU models now that they pass

All 8 models pass DF numerical comparison (max_abs_err < 1e-5)
- Adapt neura_pipeline.py to torch-mlir 20260531 API
- Add ExpandMathToArith pass (fpowi/tanh expansion)
- Update interpreter taskflow op handling
- Update CMakeLists and pass registrations
- Add README and environment.yml for py-frontend
- Remove deprecated test files (verify_models.py, compare_df_numerics.py, old environment.yml)
…README with kernel scheduling & dependency docs
@tancheng

Copy link
Copy Markdown
Contributor

Hi @itemkelvin, we are not provide new primitives in python/pytorch, right? We are just lowering pytorch to neura in this PR?

@itemkelvin

itemkelvin commented Jun 14, 2026 via email

Copy link
Copy Markdown
Collaborator Author

@itemkelvin itemkelvin changed the title Python frontend Add Python frontend Jun 14, 2026
@tancheng

Copy link
Copy Markdown
Contributor

Hi @itemkelvin, I guess the PR is with the help from some LLM? If so, is it convenient for you to split this huge PR into a few small PRs.. It is currently hard for us to review. Can you let LLM to give you a plan about how many small PRs you need to complete this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants