Skip to content

feat(backend): Add per-backend dtype allowlist for gather op#1317

Draft
Little-oil wants to merge 2 commits into
hw-native-sys:mainfrom
Little-oil:supplement_gather
Draft

feat(backend): Add per-backend dtype allowlist for gather op#1317
Little-oil wants to merge 2 commits into
hw-native-sys:mainfrom
Little-oil:supplement_gather

Conversation

@Little-oil
Copy link
Copy Markdown
Contributor

@Little-oil Little-oil commented May 8, 2026

Summary

This PR completes the gather op's index form and introduces a per-backend dtype allowlist mechanism so the same op can accept different dtypes on different platforms without if (BackendType == ...) branches in passes.

1. Per-backend dtype allowlist (new mechanism)

BackendHandler::IsDtypeSupported(op_name, arg_role, dtype) — new virtual method on the backend handler interface (include/pypto/backend/common/backend_handler.h). Defaults to false; each backend opts in to whatever it actually accepts via a per-(op, arg_role) allowlist table.

The pattern (using a hypothetical tile.foo(src, idx) / tensor.foo(input, index)):

  1. Op type-deduction widens the IR-level allowlist to the union across all backends, then calls CheckBackendDtype(op, arg_role, dtype) to narrow to the active backend when one is configured:
    CHECK(src_type->dtype_ == DataType::FP16 || src_type->dtype_ == DataType::FP32 ||
          src_type->dtype_ == DataType::INT8)
        << "...";
    CheckBackendDtype("tile.foo", "src", src_type->dtype_);
  2. Per-backend tables in src/backend/910B/backend_910b_handler.cpp and src/backend/950/backend_950_handler.cpp register the real subset for each backend.

No changes needed to existing public headers beyond the new virtual method override.

2. Gather op — applied to the new mechanism

Op type-deduction (src/ir/op/{tile,tensor}_ops/gather.cpp):

  • Universal union: src ∈ {FP16, FP32, INT8, INT16, INT32}, indices ∈ {INT16, INT32}.
  • tile.gather tmp workspace constraint relaxed from hardcoded INT32 to "must match indices dtype".

910B (a2a3) allowlist (src/backend/910B/backend_910b_handler.cpp):

  • tile.gather / tensor.gather src{FP16, FP32, INT16, INT32}
  • tile.gather / tensor.gather indices{INT32}

950 (a5) allowlist (src/backend/950/backend_950_handler.cpp):

  • tile.gather / tensor.gather src{INT8, FP16, FP32, INT16, INT32} (a2a3 ∪ INT8)
  • tile.gather / tensor.gather indices{INT16, INT32} (a2a3 ∪ INT16)

3. Generalized tensor.gather lowering

src/ir/transforms/op_conversion_registry.cpp:

  • Replaces the case-3 (rank=3 dim=0) and case-4 (rank=3 dim=1) special cases with a single emit_flat_index_gather(gather_dim) helper that uses mixed-radix decomposition of the loop variable.
  • Adds case 5 (rank=2 dim=0) and case 6 (rank≥4 any dim) — handled by the same helper.
  • Internal tmp/range tiles now use idx_dtype instead of hardcoded INT32, so INT16-indices paths share the same lowering once the codegen-side INT16 work lands.

4. Tests

Unit tests (tests/ut/ir/operators/test_tensor_ops.py):

  • Updated rejection messages to match the widened union (FP16, FP32, INT8, INT16, or INT32 / INT16 or INT32).

ST tests (tests/st/runtime/test_gather.py):

  • New cases for the generalized lowering: rank-2 dim=0, rank-4 dim=-1, rank-4 dim=2.
  • New a5-only case: rank-2 INT8 src + INT32 idx (validates the per-backend allowlist).
  • Platform markers (@pytest.mark.platforms("a2a3", "a2a3sim") / ("a5", "a5sim")) added per test.
  • Removed hardcoded BackendType.Ascend910B from the base class so each test case picks its own backend.

ST harness (tests/st/harness/core/harness.py): Added INT8 to the DataType enum.

Testing

  • Build passes
  • Gather UTs pass
  • Code review completed
  • Gather ST suite — 910B cases pass on a2a3
  • Gather ST suite — 950 INT8 case passes on a5

Notes

  • INT16 indices are accepted at the op layer for a5, but the lowering's range_1d / tmp now use the indices dtype consistently. An end-to-end INT16-idx ST case is intentionally deferred to a follow-up PR pending PTOAS-side INT16 codegen verification.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 328af198-3163-4c6c-b0c1-aca3081efeeb

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a BackendHandler::IsDtypeSupported hook, provides Ascend910B/950 allowlists, updates tensor.gather and tile.gather type checks to consult backends (wider index/input dtypes), refactors tensor.gather lowering to a generalized emit_flat_index_gather, and extends tests for extra ranks/dimensions.

Changes

Gather Backend Dtype Support

Layer / File(s) Summary
Backend Handler Interface
include/pypto/backend/common/backend_handler.h
Adds DataType include and virtual IsDtypeSupported(op_name, arg_role, dtype) with default returning false.
Backend Handler Declarations
include/pypto/backend/910B/backend_910b_handler.h, include/pypto/backend/950/backend_950_handler.h
Declare IsDtypeSupported overrides and add pypto/core/dtype.h includes.
Backend Dtype Allowlists
src/backend/910B/backend_910b_handler.cpp, src/backend/950/backend_950_handler.cpp
Add anonymous-namespace allowlist maps and implement IsDtypeSupported lookups for gather op roles.
IR Type Inference
src/ir/op/tensor_ops/gather.cpp, src/ir/op/tile_ops/gather.cpp
Add CheckBackendDtype helpers; expand allowed src/indices dtypes and validate them against backend handlers; update REGISTER_OP argument docs.
Gather Lowering Generalization
src/ir/transforms/op_conversion_registry.cpp
Introduce emit_flat_index_gather(gather_dim), propagate idx_dtype for temporaries, and dispatch for rank-2, rank-3, and rank≥4 cases.
Tests / Docs
tests/st/runtime/test_gather.py, tests/ut/ir/operators/test_tensor_ops.py
Add runtime programs/tests for rank-2 dim=0 and rank-4 dims; update unit test error assertions; remove BackendType import and get_backend_type override.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • lyfne123

Poem

🐰 I hopped through gather's branching tree,
checked dtypes by backend, one-two-three.
Ranks stretched wide, indices danced light,
temp dtypes matched, and tests took flight.
A carrot of correctness — tucked in tight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: adding a per-backend dtype allowlist mechanism for the gather operation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description clearly explains the changes: introducing a per-backend dtype allowlist mechanism for the gather op, with specific allowed dtypes for different backends (910B and 950), generalized tensor.gather lowering, and updated tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism for backend-specific data type validation for operators, specifically applied to the gather operation. It generalizes the gather lowering logic to support arbitrary ranks and dimensions using a flat-index approach and expands the supported data types for the Ascend950 backend to include INT8 for sources and INT16 for indices. The review feedback correctly identifies that the lowering logic uses hardcoded INT32 constants for index arithmetic, which will cause type mismatches when INT16 indices are used; it suggests using the actual index tensor data type for these constants to maintain IR consistency.

Comment thread src/ir/transforms/op_conversion_registry.cpp
Comment thread src/ir/transforms/op_conversion_registry.cpp
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
tests/ut/ir/operators/test_tensor_ops.py (1)

2397-2401: ⚡ Quick win

Add positive tests for newly accepted gather dtypes.

This update strengthens rejection paths, but it still doesn’t assert success for the newly allowed tensor-level cases (index=INT16, src=INT8). Adding those keeps the widened contract from regressing silently.

Suggested test additions
+def test_tensor_gather_accepts_int16_index():
+    inp, idx = _make_gather_inputs(idx_dtype=DataType.INT16)
+    call = ir.op.tensor.gather(inp, dim=-1, index=idx)
+    assert call.op.name == "tensor.gather"
+
+
+def test_tensor_gather_accepts_int8_input():
+    inp, idx = _make_gather_inputs(src_dtype=DataType.INT8, idx_dtype=DataType.INT32)
+    call = ir.op.tensor.gather(inp, dim=-1, index=idx)
+    rt = call.type
+    assert isinstance(rt, ir.TensorType)
+    assert rt.dtype == DataType.INT8

Also applies to: 2404-2407

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/ut/ir/operators/test_tensor_ops.py` around lines 2397 - 2401, Add
positive assertions that exercise the newly-accepted gather dtypes: use
_make_gather_inputs to create inputs with index dtype DataType.INT16 and source
dtype DataType.INT8 and call ir.op.tensor.gather(inp, dim=-1, index=idx)
expecting no exception (i.e., remove pytest.raises and let the call succeed),
and mirror the same positive test for the adjacent/relevant gather test to
ensure both newly-allowed cases are asserted as successful rather than only
asserting rejections.
src/ir/transforms/op_conversion_registry.cpp (1)

1148-1275: 💤 Low value

Top-of-section comment in this file now lists only 4 cases — consider refreshing for the new dispatch.

The block comment at lines 889–938 still enumerates exactly “Four cases (by rank and norm_dim)” (rank-2 dim=1, rank-3 dim=0/1/2). With this PR, RegisterGatherOps now also dispatches rank-2 dim=0 and rank≥4 any-dim through emit_flat_index_gather. The new inline comment at lines 1122–1147 describes the generalized helper well, but a reader scanning the file’s top-of-section overview will get a stale picture of supported cases. Worth a one-paragraph refresh to mention the rank-2 dim=0 and rank≥4 routes alongside the existing four.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/ir/transforms/op_conversion_registry.cpp` around lines 1148 - 1275,
Update the top-of-section block comment that enumerates "Four cases (by rank and
norm_dim)" to reflect the new dispatch paths: mention that
emit_flat_index_gather now handles rank-2 dim=0 and any rank>=4 (in addition to
the previously-listed rank-2 dim=1 and rank-3 dim=0/1/2 cases); locate the
comment near RegisterGatherOps/emit_flat_index_gather and replace the outdated
four-case enumeration with a short paragraph that lists the full set of
dispatched routes (rank==2 dim==1, rank==2 dim==0, rank==3 dim==0/1/2, and
rank>=4 any-dim) so the overview matches the actual dispatch logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/ir/op/tensor_ops/gather.cpp`:
- Around line 61-67: Update the tensor.gather op registration strings to reflect
the widened dtype contract: include INT8 as an allowed src dtype and allow index
to be INT16 or INT32. Locate the tensor.gather registration metadata (the
human-readable type description lines that currently list allowed src/index
dtypes) and modify them so they match the runtime checks (which use CHECK on
input_type->dtype_ and CheckBackendDtype for "src"); apply the same change to
the second registration occurrence referenced in the comment so both metadata
entries advertise the new INT8 for src and INT16|INT32 for index.

In `@src/ir/op/tile_ops/gather.cpp`:
- Around line 68-74: The operator argument documentation for tile.gather is out
of sync with the deduce logic: update the registered argument descriptions for
the gather op (the tile.gather registration/arg doc strings) to reflect that src
may be FP16|FP32|INT8|INT16|INT32 (per the CHECK on src_type and
CheckBackendDtype usage), indices may be INT16 or INT32, and tmp should be
documented as matching the indices dtype (tmp dtype == indices dtype); also
update the other duplicate doc block that mirrors these lines (the section
corresponding to the same registration around the other check). Ensure op_name,
src_type, CheckBackendDtype, and the indices/tmp arg descriptions are edited so
the docs match the new deduce rules.

---

Nitpick comments:
In `@src/ir/transforms/op_conversion_registry.cpp`:
- Around line 1148-1275: Update the top-of-section block comment that enumerates
"Four cases (by rank and norm_dim)" to reflect the new dispatch paths: mention
that emit_flat_index_gather now handles rank-2 dim=0 and any rank>=4 (in
addition to the previously-listed rank-2 dim=1 and rank-3 dim=0/1/2 cases);
locate the comment near RegisterGatherOps/emit_flat_index_gather and replace the
outdated four-case enumeration with a short paragraph that lists the full set of
dispatched routes (rank==2 dim==1, rank==2 dim==0, rank==3 dim==0/1/2, and
rank>=4 any-dim) so the overview matches the actual dispatch logic.

In `@tests/ut/ir/operators/test_tensor_ops.py`:
- Around line 2397-2401: Add positive assertions that exercise the
newly-accepted gather dtypes: use _make_gather_inputs to create inputs with
index dtype DataType.INT16 and source dtype DataType.INT8 and call
ir.op.tensor.gather(inp, dim=-1, index=idx) expecting no exception (i.e., remove
pytest.raises and let the call succeed), and mirror the same positive test for
the adjacent/relevant gather test to ensure both newly-allowed cases are
asserted as successful rather than only asserting rejections.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 52f38f37-e181-435f-bd61-e8e69d4eefaa

📥 Commits

Reviewing files that changed from the base of the PR and between cba8594 and be31737.

📒 Files selected for processing (10)
  • include/pypto/backend/910B/backend_910b_handler.h
  • include/pypto/backend/950/backend_950_handler.h
  • include/pypto/backend/common/backend_handler.h
  • src/backend/910B/backend_910b_handler.cpp
  • src/backend/950/backend_950_handler.cpp
  • src/ir/op/tensor_ops/gather.cpp
  • src/ir/op/tile_ops/gather.cpp
  • src/ir/transforms/op_conversion_registry.cpp
  • tests/st/runtime/test_gather.py
  • tests/ut/ir/operators/test_tensor_ops.py

Comment thread src/ir/op/tensor_ops/gather.cpp Outdated
Comment thread src/ir/op/tile_ops/gather.cpp
Little-oil pushed a commit to Little-oil/pypto that referenced this pull request May 9, 2026
- Use index_tensor_type->dtype_ for tile.ci range and tile.muls multiplier
  in tensor.gather lowering, so INT16 indices flow through without
  hardcoded INT32 mismatches (gemini review).
- Update tensor.gather and tile.gather argument descriptions to reflect
  the per-backend dtype contract: INT8 src and INT16 indices are valid on
  Ascend950 (coderabbit review).
- Add direct #include "pypto/core/dtype.h" to backend handler files so
  clang-tidy's misc-include-cleaner is satisfied.
Little-oil pushed a commit to Little-oil/pypto that referenced this pull request May 9, 2026
Introduces BackendHandler::IsDtypeSupported(op_name, arg_role, dtype)
so gather's accepted dtypes can vary per backend:
- a2a3 (910B): src {FP16, FP32, INT16, INT32}, indices {INT32}
- a5  (950) : src adds INT8; indices adds INT16

The op-level type-deduction enforces the universal union (a2a3 ∪ a5)
and then narrows to the active backend via CheckBackendDtype when a
backend is configured. Generalises tensor.gather lowering and updates
the gather ST/UT suite accordingly.

fix(pr): resolve issues for hw-native-sys#1317

- Use index_tensor_type->dtype_ for tile.ci range and tile.muls multiplier
  in tensor.gather lowering, so INT16 indices flow through without
  hardcoded INT32 mismatches (gemini review).
- Update tensor.gather and tile.gather argument descriptions to reflect
  the per-backend dtype contract: INT8 src and INT16 indices are valid on
  Ascend950 (coderabbit review).
- Add direct #include "pypto/core/dtype.h" to backend handler files so
  clang-tidy's misc-include-cleaner is satisfied.

test(gather): Add INT8 ST test case for Ascend950 backend

Extend gather ST coverage with an INT8 src + INT32 idx case targeting
Ascend950 (a5 dtype allowlist), and tag existing rank/dim coverage tests
with explicit a2a3 platform markers + Ascend910B backend type. Adds
INT8 to harness DataType enum.

删除多余的注释

提取CheckOpDtype函数

撤回不必要的类型区分,PTOAS会给出错误日志
@Little-oil Little-oil force-pushed the supplement_gather branch from dd57a76 to 0f1b5de Compare May 9, 2026 06:27
@Little-oil Little-oil marked this pull request as draft May 11, 2026 11:17
Youhezhen added 2 commits May 12, 2026 14:44
Introduces BackendHandler::IsDtypeSupported(op_name, arg_role, dtype)
so gather's accepted dtypes can vary per backend:
- a2a3 (910B): src {FP16, FP32, INT16, INT32}, indices {INT32}
- a5  (950) : src adds INT8; indices adds INT16

The op-level type-deduction enforces the universal union (a2a3 ∪ a5)
and then narrows to the active backend via CheckBackendDtype when a
backend is configured. Generalises tensor.gather lowering and updates
the gather ST/UT suite accordingly.

fix(pr): resolve issues for hw-native-sys#1317

- Use index_tensor_type->dtype_ for tile.ci range and tile.muls multiplier
  in tensor.gather lowering, so INT16 indices flow through without
  hardcoded INT32 mismatches (gemini review).
- Update tensor.gather and tile.gather argument descriptions to reflect
  the per-backend dtype contract: INT8 src and INT16 indices are valid on
  Ascend950 (coderabbit review).
- Add direct #include "pypto/core/dtype.h" to backend handler files so
  clang-tidy's misc-include-cleaner is satisfied.

test(gather): Add INT8 ST test case for Ascend950 backend

Extend gather ST coverage with an INT8 src + INT32 idx case targeting
Ascend950 (a5 dtype allowlist), and tag existing rank/dim coverage tests
with explicit a2a3 platform markers + Ascend910B backend type. Adds
INT8 to harness DataType enum.

删除多余的注释

提取CheckOpDtype函数

撤回不必要的类型区分,PTOAS会给出错误日志
- Refresh the gather-section block comment in op_conversion_registry.cpp
  to list all six dispatch routes (was stale "Four cases"): rank=2 dim=0
  and rank>=4 any-dim now go through emit_flat_index_gather alongside
  the existing rank=3 dim=0/1/2 cases.
- Add positive tensor.gather tests for the newly accepted dtypes
  (INT16 index, INT8 src), complementing the existing rejection tests.
@Little-oil Little-oil force-pushed the supplement_gather branch from 0f1b5de to c5a4654 Compare May 12, 2026 06:53
@Little-oil
Copy link
Copy Markdown
Contributor Author

Addressed the two remaining CodeRabbit nitpicks from the latest review in c5a46541:

  • src/ir/transforms/op_conversion_registry.cpp (top-of-section comment, ~L912) — refreshed the stale "Four cases" enumeration to list all six dispatch routes (Cases 1–6), making rank=2 dim=0 and rank≥4 any-dim explicit.
  • tests/ut/ir/operators/test_tensor_ops.py — added test_tensor_gather_accepts_int16_index and test_tensor_gather_accepts_int8_input so the widened dtype contract is exercised positively, not only via rejection paths.

Branch was also rebased onto the latest origin/main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant