Skip to content

[AscendNPU-IR][A5] Support cmp and select ops.#908

Open
looo000ooong wants to merge 1 commit into
tile-ai:npuir-devfrom
looo000ooong:npuir-dev
Open

[AscendNPU-IR][A5] Support cmp and select ops.#908
looo000ooong wants to merge 1 commit into
tile-ai:npuir-devfrom
looo000ooong:npuir-dev

Conversation

@looo000ooong
Copy link
Copy Markdown

Support compile and select ops.

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run bash format.sh in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work!

🚀

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for comparison and selection operations on the A5 architecture within the NPUIR codegen. Key changes include the implementation of VselectCodegenA5, helper functions for mapping string modes to MLIR predicates, and updates to CreateHIVMBinaryVectorOp to handle arith::CmpFOp and arith::CmpIOp with appropriate type casting. Feedback highlights a potential MLIR verification failure due to mismatched element types during condition broadcasting and a logic error in the template dispatch for comparison operations. Additionally, improvements were suggested regarding the removal of unused variables and the implementation of stricter error handling for unsupported comparison modes.

Comment thread src/target/codegen_npuir_dev.cc Outdated
Comment thread src/target/codegen_npuir_dev.cc Outdated
Comment on lines +3165 to +3169
if constexpr (std::is_same_v<T, mlir::arith::CmpFOp>) {
cmpOp = builder.create<mlir::arith::CmpFOp>(loc, GetFPredicate(mode), src0, src1);
} else {
cmpOp = builder.create<mlir::arith::CmpIOp>(loc, GetIPredicate(mode), src0, src1);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current logic uses if constexpr on the template parameter T to decide whether to create a CmpFOp or CmpIOp. However, CreateHIVMBinaryVectorOp is called with both CmpFOp and CmpIOp as template arguments (line 3572). This means std::is_same_v<T, mlir::arith::CmpFOp> will always be true in that context, forcing a CmpFOp even if the input operands are integers. You must check the actual runtime type of the operands (srcType) to dispatch to the correct MLIR operation.

      mlir::Value cmpOp;
      if (srcType.isa<mlir::FloatType>()) {
          cmpOp = builder.create<mlir::arith::CmpFOp>(loc, GetFPredicate(mode), src0, src1);
      } else {
          cmpOp = builder.create<mlir::arith::CmpIOp>(loc, GetIPredicate(mode), src0, src1);
      }

Comment thread src/target/codegen_npuir_dev.cc Outdated
Comment thread src/target/codegen_npuir_dev.cc
Comment thread src/target/codegen_npuir_dev.cc
@CeleNewYear CeleNewYear added AscendNPU-IR Want to merge into the npuir branch A5 labels Apr 25, 2026
Comment thread src/target/codegen_npuir_dev.cc
Comment thread src/target/codegen_npuir_dev.cc
Comment thread testing/npuir/linalg_ops/test_cmp_select_dev.py
@looo000ooong looo000ooong force-pushed the npuir-dev branch 6 times, most recently from 880852a to 9ccfc70 Compare April 28, 2026 03:27
@looo000ooong looo000ooong requested a review from CeleNewYear May 6, 2026 01:25
mlir::Value dst_data_name = GetVarValue(npuirop.dst);

if (!dst_data_name.getType().isa<mlir::TensorType>()) {
return;
Copy link
Copy Markdown

@yyyy999 yyyy999 May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check all three inputs' (src0 , src1 , dst) TensorType.

Comment thread src/target/codegen_npuir_dev.cc Outdated
if (mode == "lt") return mlir::arith::CmpIPredicate::slt;
if (mode == "le") return mlir::arith::CmpIPredicate::sle;
if (mode == "gt") return mlir::arith::CmpIPredicate::sgt;
if (mode == "ge") return mlir::arith::CmpIPredicate::sge;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For unsigned integer types (e.g., uint8 ), ult / ule / ugt / uge should be used. The current implementation always uses signed comparisons, which will produce incorrect results for unsigned integers.

Comment thread src/target/codegen_npuir_dev.cc Outdated

mlir::arith::CmpFPredicate GetFPredicate(std::string mode) {
if (mode == "eq") return mlir::arith::CmpFPredicate::OEQ;
if (mode == "ne") return mlir::arith::CmpFPredicate::UNE;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UNE means "unordered or not equal", i.e., it returns true when either operand is NaN. The original hivm::VCmpOp 's NE mode semantics might be "ordered not equal" ( ONE ). For floating-point comparisons involving NaN, UNE and ONE behave differently.
Please confirm the NE semantics of the VCmpOp . If it's an ordered comparison, use ONE ; if unordered semantics are indeed intended, add a comment explaining why.


A_full = torch.randn((M, N), dtype=torch.float16).npu()
B_full = torch.randn((M, N), dtype=torch.float16).npu()
C_full = torch.empty((M, N), dtype=torch.float16).npu()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the core CmpOp change involves float vs integer dispatch logic, maybe should add test cases for float32 and int32 / int8 types, especially end-to-end verification of integer comparisons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A5 AscendNPU-IR Want to merge into the npuir branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants