feat(backend): Add per-backend dtype allowlist for gather op by Little-oil · Pull Request #1317 · hw-native-sys/pypto

Little-oil · 2026-05-08T09:31:05Z

Summary

This PR completes the gather op's index form and introduces a per-backend dtype allowlist mechanism so the same op can accept different dtypes on different platforms without if (BackendType == ...) branches in passes.

1. Per-backend dtype allowlist (new mechanism)

BackendHandler::IsDtypeSupported(op_name, arg_role, dtype) — new virtual method on the backend handler interface (include/pypto/backend/common/backend_handler.h). Defaults to false; each backend opts in to whatever it actually accepts via a per-(op, arg_role) allowlist table.

The pattern (using a hypothetical tile.foo(src, idx) / tensor.foo(input, index)):

Op type-deduction widens the IR-level allowlist to the union across all backends, then calls CheckBackendDtype(op, arg_role, dtype) to narrow to the active backend when one is configured:

CHECK(src_type->dtype_ == DataType::FP16 || src_type->dtype_ == DataType::FP32 ||
      src_type->dtype_ == DataType::INT8)
    << "...";
CheckBackendDtype("tile.foo", "src", src_type->dtype_);

Per-backend tables in src/backend/910B/backend_910b_handler.cpp and src/backend/950/backend_950_handler.cpp register the real subset for each backend.

No changes needed to existing public headers beyond the new virtual method override.

2. Gather op — applied to the new mechanism

Op type-deduction (src/ir/op/{tile,tensor}_ops/gather.cpp):

Universal union: src ∈ {FP16, FP32, INT8, INT16, INT32}, indices ∈ {INT16, INT32}.
tile.gather tmp workspace constraint relaxed from hardcoded INT32 to "must match indices dtype".

910B (a2a3) allowlist (src/backend/910B/backend_910b_handler.cpp):

tile.gather / tensor.gather src ∈ {FP16, FP32, INT16, INT32}
tile.gather / tensor.gather indices ∈ {INT32}

950 (a5) allowlist (src/backend/950/backend_950_handler.cpp):

tile.gather / tensor.gather src ∈ {INT8, FP16, FP32, INT16, INT32} (a2a3 ∪ INT8)
tile.gather / tensor.gather indices ∈ {INT16, INT32} (a2a3 ∪ INT16)

3. Generalized `tensor.gather` lowering

src/ir/transforms/op_conversion_registry.cpp:

Replaces the case-3 (rank=3 dim=0) and case-4 (rank=3 dim=1) special cases with a single emit_flat_index_gather(gather_dim) helper that uses mixed-radix decomposition of the loop variable.
Adds case 5 (rank=2 dim=0) and case 6 (rank≥4 any dim) — handled by the same helper.
Internal tmp/range tiles now use idx_dtype instead of hardcoded INT32, so INT16-indices paths share the same lowering once the codegen-side INT16 work lands.

4. Tests

Unit tests (tests/ut/ir/operators/test_tensor_ops.py):

Updated rejection messages to match the widened union (FP16, FP32, INT8, INT16, or INT32 / INT16 or INT32).

ST tests (tests/st/runtime/test_gather.py):

New cases for the generalized lowering: rank-2 dim=0, rank-4 dim=-1, rank-4 dim=2.
New a5-only case: rank-2 INT8 src + INT32 idx (validates the per-backend allowlist).
Platform markers (@pytest.mark.platforms("a2a3", "a2a3sim") / ("a5", "a5sim")) added per test.
Removed hardcoded BackendType.Ascend910B from the base class so each test case picks its own backend.

ST harness (tests/st/harness/core/harness.py): Added INT8 to the DataType enum.

Testing

Notes

INT16 indices are accepted at the op layer for a5, but the lowering's range_1d / tmp now use the indices dtype consistently. An end-to-end INT16-idx ST case is intentionally deferred to a follow-up PR pending PTOAS-side INT16 codegen verification.

coderabbitai · 2026-05-08T09:31:20Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 328af198-3163-4c6c-b0c1-aca3081efeeb

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a BackendHandler::IsDtypeSupported hook, provides Ascend910B/950 allowlists, updates tensor.gather and tile.gather type checks to consult backends (wider index/input dtypes), refactors tensor.gather lowering to a generalized emit_flat_index_gather, and extends tests for extra ranks/dimensions.

Changes

Gather Backend Dtype Support

Layer / File(s)	Summary
Backend Handler Interface `include/pypto/backend/common/backend_handler.h`	Adds `DataType` include and virtual `IsDtypeSupported(op_name, arg_role, dtype)` with default returning `false`.
Backend Handler Declarations `include/pypto/backend/910B/backend_910b_handler.h`, `include/pypto/backend/950/backend_950_handler.h`	Declare `IsDtypeSupported` overrides and add `pypto/core/dtype.h` includes.
Backend Dtype Allowlists `src/backend/910B/backend_910b_handler.cpp`, `src/backend/950/backend_950_handler.cpp`	Add anonymous-namespace allowlist maps and implement `IsDtypeSupported` lookups for gather op roles.
IR Type Inference `src/ir/op/tensor_ops/gather.cpp`, `src/ir/op/tile_ops/gather.cpp`	Add `CheckBackendDtype` helpers; expand allowed `src`/`indices` dtypes and validate them against backend handlers; update REGISTER_OP argument docs.
Gather Lowering Generalization `src/ir/transforms/op_conversion_registry.cpp`	Introduce `emit_flat_index_gather(gather_dim)`, propagate `idx_dtype` for temporaries, and dispatch for rank-2, rank-3, and rank≥4 cases.
Tests / Docs `tests/st/runtime/test_gather.py`, `tests/ut/ir/operators/test_tensor_ops.py`	Add runtime programs/tests for rank-2 dim=0 and rank-4 dims; update unit test error assertions; remove BackendType import and get_backend_type override.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

hw-native-sys/pypto#1183: Modifies tensor.gather type inference and lowering; touches the same gather lowering/type-inference areas.
hw-native-sys/pypto#1097: Related gather operator/type-check changes and lowering infrastructure.
hw-native-sys/pypto#387: Refactors op_conversion_registry.cpp and gather conversion logic; related to lowering changes.

Suggested reviewers

lyfne123

Poem

🐰 I hopped through gather's branching tree,
checked dtypes by backend, one-two-three.
Ranks stretched wide, indices danced light,
temp dtypes matched, and tests took flight.
A carrot of correctness — tucked in tight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: adding a per-backend dtype allowlist mechanism for the gather operation.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description clearly explains the changes: introducing a per-backend dtype allowlist mechanism for the gather op, with specific allowed dtypes for different backends (910B and 950), generalized tensor.gather lowering, and updated tests.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a mechanism for backend-specific data type validation for operators, specifically applied to the gather operation. It generalizes the gather lowering logic to support arbitrary ranks and dimensions using a flat-index approach and expands the supported data types for the Ascend950 backend to include INT8 for sources and INT16 for indices. The review feedback correctly identifies that the lowering logic uses hardcoded INT32 constants for index arithmetic, which will cause type mismatches when INT16 indices are used; it suggests using the actual index tensor data type for these constants to maintain IR consistency.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

tests/ut/ir/operators/test_tensor_ops.py (1)
2397-2401: ⚡ Quick win

Add positive tests for newly accepted gather dtypes.

This update strengthens rejection paths, but it still doesn’t assert success for the newly allowed tensor-level cases (index=INT16, src=INT8). Adding those keeps the widened contract from regressing silently.
Suggested test additions
+def test_tensor_gather_accepts_int16_index():
+    inp, idx = _make_gather_inputs(idx_dtype=DataType.INT16)
+    call = ir.op.tensor.gather(inp, dim=-1, index=idx)
+    assert call.op.name == "tensor.gather"
+
+
+def test_tensor_gather_accepts_int8_input():
+    inp, idx = _make_gather_inputs(src_dtype=DataType.INT8, idx_dtype=DataType.INT32)
+    call = ir.op.tensor.gather(inp, dim=-1, index=idx)
+    rt = call.type
+    assert isinstance(rt, ir.TensorType)
+    assert rt.dtype == DataType.INT8
Also applies to: 2404-2407
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/ut/ir/operators/test_tensor_ops.py` around lines 2397 - 2401, Add
positive assertions that exercise the newly-accepted gather dtypes: use
_make_gather_inputs to create inputs with index dtype DataType.INT16 and source
dtype DataType.INT8 and call ir.op.tensor.gather(inp, dim=-1, index=idx)
expecting no exception (i.e., remove pytest.raises and let the call succeed),
and mirror the same positive test for the adjacent/relevant gather test to
ensure both newly-allowed cases are asserted as successful rather than only
asserting rejections.
src/ir/transforms/op_conversion_registry.cpp (1)
1148-1275: 💤 Low value

Top-of-section comment in this file now lists only 4 cases — consider refreshing for the new dispatch.

The block comment at lines 889–938 still enumerates exactly “Four cases (by rank and norm_dim)” (rank-2 dim=1, rank-3 dim=0/1/2). With this PR, RegisterGatherOps now also dispatches rank-2 dim=0 and rank≥4 any-dim through emit_flat_index_gather. The new inline comment at lines 1122–1147 describes the generalized helper well, but a reader scanning the file’s top-of-section overview will get a stale picture of supported cases. Worth a one-paragraph refresh to mention the rank-2 dim=0 and rank≥4 routes alongside the existing four.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/ir/transforms/op_conversion_registry.cpp` around lines 1148 - 1275,
Update the top-of-section block comment that enumerates "Four cases (by rank and
norm_dim)" to reflect the new dispatch paths: mention that
emit_flat_index_gather now handles rank-2 dim=0 and any rank>=4 (in addition to
the previously-listed rank-2 dim=1 and rank-3 dim=0/1/2 cases); locate the
comment near RegisterGatherOps/emit_flat_index_gather and replace the outdated
four-case enumeration with a short paragraph that lists the full set of
dispatched routes (rank==2 dim==1, rank==2 dim==0, rank==3 dim==0/1/2, and
rank>=4 any-dim) so the overview matches the actual dispatch logic.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/ir/op/tensor_ops/gather.cpp`:
- Around line 61-67: Update the tensor.gather op registration strings to reflect
the widened dtype contract: include INT8 as an allowed src dtype and allow index
to be INT16 or INT32. Locate the tensor.gather registration metadata (the
human-readable type description lines that currently list allowed src/index
dtypes) and modify them so they match the runtime checks (which use CHECK on
input_type->dtype_ and CheckBackendDtype for "src"); apply the same change to
the second registration occurrence referenced in the comment so both metadata
entries advertise the new INT8 for src and INT16|INT32 for index.

In `@src/ir/op/tile_ops/gather.cpp`:
- Around line 68-74: The operator argument documentation for tile.gather is out
of sync with the deduce logic: update the registered argument descriptions for
the gather op (the tile.gather registration/arg doc strings) to reflect that src
may be FP16|FP32|INT8|INT16|INT32 (per the CHECK on src_type and
CheckBackendDtype usage), indices may be INT16 or INT32, and tmp should be
documented as matching the indices dtype (tmp dtype == indices dtype); also
update the other duplicate doc block that mirrors these lines (the section
corresponding to the same registration around the other check). Ensure op_name,
src_type, CheckBackendDtype, and the indices/tmp arg descriptions are edited so
the docs match the new deduce rules.

---

Nitpick comments:
In `@src/ir/transforms/op_conversion_registry.cpp`:
- Around line 1148-1275: Update the top-of-section block comment that enumerates
"Four cases (by rank and norm_dim)" to reflect the new dispatch paths: mention
that emit_flat_index_gather now handles rank-2 dim=0 and any rank>=4 (in
addition to the previously-listed rank-2 dim=1 and rank-3 dim=0/1/2 cases);
locate the comment near RegisterGatherOps/emit_flat_index_gather and replace the
outdated four-case enumeration with a short paragraph that lists the full set of
dispatched routes (rank==2 dim==1, rank==2 dim==0, rank==3 dim==0/1/2, and
rank>=4 any-dim) so the overview matches the actual dispatch logic.

In `@tests/ut/ir/operators/test_tensor_ops.py`:
- Around line 2397-2401: Add positive assertions that exercise the
newly-accepted gather dtypes: use _make_gather_inputs to create inputs with
index dtype DataType.INT16 and source dtype DataType.INT8 and call
ir.op.tensor.gather(inp, dim=-1, index=idx) expecting no exception (i.e., remove
pytest.raises and let the call succeed), and mirror the same positive test for
the adjacent/relevant gather test to ensure both newly-allowed cases are
asserted as successful rather than only asserting rejections.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 52f38f37-e181-435f-bd61-e8e69d4eefaa

📥 Commits

Reviewing files that changed from the base of the PR and between cba8594 and be31737.

📒 Files selected for processing (10)

include/pypto/backend/910B/backend_910b_handler.h
include/pypto/backend/950/backend_950_handler.h
include/pypto/backend/common/backend_handler.h
src/backend/910B/backend_910b_handler.cpp
src/backend/950/backend_950_handler.cpp
src/ir/op/tensor_ops/gather.cpp
src/ir/op/tile_ops/gather.cpp
src/ir/transforms/op_conversion_registry.cpp
tests/st/runtime/test_gather.py
tests/ut/ir/operators/test_tensor_ops.py

- Use index_tensor_type->dtype_ for tile.ci range and tile.muls multiplier in tensor.gather lowering, so INT16 indices flow through without hardcoded INT32 mismatches (gemini review). - Update tensor.gather and tile.gather argument descriptions to reflect the per-backend dtype contract: INT8 src and INT16 indices are valid on Ascend950 (coderabbit review). - Add direct #include "pypto/core/dtype.h" to backend handler files so clang-tidy's misc-include-cleaner is satisfied.

Introduces BackendHandler::IsDtypeSupported(op_name, arg_role, dtype) so gather's accepted dtypes can vary per backend: - a2a3 (910B): src {FP16, FP32, INT16, INT32}, indices {INT32} - a5 (950) : src adds INT8; indices adds INT16 The op-level type-deduction enforces the universal union (a2a3 ∪ a5) and then narrows to the active backend via CheckBackendDtype when a backend is configured. Generalises tensor.gather lowering and updates the gather ST/UT suite accordingly. fix(pr): resolve issues for hw-native-sys#1317 - Use index_tensor_type->dtype_ for tile.ci range and tile.muls multiplier in tensor.gather lowering, so INT16 indices flow through without hardcoded INT32 mismatches (gemini review). - Update tensor.gather and tile.gather argument descriptions to reflect the per-backend dtype contract: INT8 src and INT16 indices are valid on Ascend950 (coderabbit review). - Add direct #include "pypto/core/dtype.h" to backend handler files so clang-tidy's misc-include-cleaner is satisfied. test(gather): Add INT8 ST test case for Ascend950 backend Extend gather ST coverage with an INT8 src + INT32 idx case targeting Ascend950 (a5 dtype allowlist), and tag existing rank/dim coverage tests with explicit a2a3 platform markers + Ascend910B backend type. Adds INT8 to harness DataType enum. 删除多余的注释提取CheckOpDtype函数撤回不必要的类型区分，PTOAS会给出错误日志

- Refresh the gather-section block comment in op_conversion_registry.cpp to list all six dispatch routes (was stale "Four cases"): rank=2 dim=0 and rank>=4 any-dim now go through emit_flat_index_gather alongside the existing rank=3 dim=0/1/2 cases. - Add positive tensor.gather tests for the newly accepted dtypes (INT16 index, INT8 src), complementing the existing rejection tests.

Little-oil · 2026-05-12T06:54:03Z

Addressed the two remaining CodeRabbit nitpicks from the latest review in c5a46541:

src/ir/transforms/op_conversion_registry.cpp (top-of-section comment, ~L912) — refreshed the stale "Four cases" enumeration to list all six dispatch routes (Cases 1–6), making rank=2 dim=0 and rank≥4 any-dim explicit.
tests/ut/ir/operators/test_tensor_ops.py — added test_tensor_gather_accepts_int16_index and test_tensor_gather_accepts_int8_input so the widened dtype contract is exercised positively, not only via rejection paths.

Branch was also rebased onto the latest origin/main.

github-project-automation Bot added this to pto project May 8, 2026

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

Comment thread src/ir/transforms/op_conversion_registry.cpp

Comment thread src/ir/transforms/op_conversion_registry.cpp

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Comment thread src/ir/op/tensor_ops/gather.cpp Outdated

Comment thread src/ir/op/tile_ops/gather.cpp

Little-oil force-pushed the supplement_gather branch from dd57a76 to 0f1b5de Compare May 9, 2026 06:27

Little-oil marked this pull request as draft May 11, 2026 11:17

Youhezhen added 2 commits May 12, 2026 14:44

Little-oil force-pushed the supplement_gather branch from 0f1b5de to c5a4654 Compare May 12, 2026 06:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(backend): Add per-backend dtype allowlist for gather op#1317

feat(backend): Add per-backend dtype allowlist for gather op#1317
Little-oil wants to merge 2 commits into
hw-native-sys:mainfrom
Little-oil:supplement_gather

Little-oil commented May 8, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 8, 2026 •

edited

Loading

Review skipped

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Little-oil commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Little-oil commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Per-backend dtype allowlist (new mechanism)

2. Gather op — applied to the new mechanism

3. Generalized tensor.gather lowering

4. Tests

Testing

Notes

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Little-oil commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Little-oil commented May 8, 2026 •

edited

Loading

3. Generalized `tensor.gather` lowering

coderabbitai Bot commented May 8, 2026 •

edited

Loading