[AscendNPU-IR][A5] A5 support for floorOp and floorDivOp#950
[AscendNPU-IR][A5] A5 support for floorOp and floorDivOp#950Ismayil06 wants to merge 1 commit intotile-ai:npuir-devfrom
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
There was a problem hiding this comment.
Code Review
This pull request introduces support for floor and floordiv operations in the NPU IR backend, encompassing operator definitions, codegen implementations, and corresponding unit tests. The review identified several critical issues: the vector codegen for floor division lacks necessary broadcasting and sliced insertion logic, and the expression visitation for floating-point floor division incorrectly updates the expression map with intermediate results. Additionally, the integer floor division path still fails to match Python semantics for negative numbers, a test fixture name mismatch will cause execution failures, and docstrings in the Python wrapper need to be moved inside the function bodies.
| void CodeGenTileLangNPUIRDEV::VfloordivCodegen(const CallNode *op) { | ||
| tvm::tl::NpuirFloorDiv npuirop(op->args, this->vmap); | ||
| auto loc = builder.getUnknownLoc(); | ||
|
|
||
| Value src0 = GenExtractSliceFromRegion(npuirop.src0, npuirop.src0_range); | ||
| Value src1 = GenExtractSliceFromRegion(npuirop.src1, npuirop.src1_range); | ||
|
|
||
| auto srcType = getElementTypeOrSelf(src0.getType()); | ||
| Value result; | ||
|
|
||
| if (srcType.isa<FloatType>()) { | ||
| auto divResult = builder.create<mlir::arith::DivFOp>(loc, src0, src1); | ||
| result = builder.create<mlir::math::FloorOp>(loc, divResult); | ||
| } else { | ||
| result = builder.create<mlir::arith::DivSIOp>(loc, src0, src1); | ||
| } | ||
|
|
||
| SetVarValue(npuirop.dst, result); | ||
| } |
There was a problem hiding this comment.
The implementation of VfloordivCodegen is missing critical logic present in other vector operations:
- Broadcasting: MLIR arithmetic operations require operands to have matching shapes. If
src0andsrc1have different shapes (e.g., one is a broadcasted dimension), this will fail. You should usebroadcastOrTransposeto align operand shapes. - Sliced Insertion:
SetVarValue(npuirop.dst, result)overwrites the entire buffer variable mapping. If the operation is performed on a slice (as indicated bydst_range), it must useReshapeCastAndInsertSliceto update only the relevant portion of the tensor and maintain SSA consistency.
| auto divResult = BinaryOpCodegen<mlir::arith::DivFOp, std::nullptr_t>(op, nullptr, | ||
| lhs, rhs); | ||
| mlirVal = builder.create<mlir::math::FloorOp>(builder.getUnknownLoc(), divResult); |
There was a problem hiding this comment.
There is a correctness issue with how prim_expr_map is updated here. BinaryOpCodegen internally calls UpdatePrimExprMap(op, mlirVal), associating the FloorDivNode (op) with the intermediate arith::DivFOp result. Subsequent calls to MakeValue for the same expression will return the un-floored division result from the map. You should manually check the map, compute the result, and then update the map with the final mlirVal.
} else if (op->dtype.is_float()) {
auto result = CheckPrimExprMap(op);
if (result.first) return result.second;
auto divResult = builder.create<mlir::arith::DivFOp>(builder.getUnknownLoc(), lhs, rhs);
mlirVal = builder.create<mlir::math::FloorOp>(builder.getUnknownLoc(), divResult);
UpdatePrimExprMap(op, mlirVal);
}| ((1024, 16384), "float32"), | ||
| ] | ||
| ) | ||
| def floordiv_case(request): |
There was a problem hiding this comment.
| auto rhs = MakeValue(op->b); | ||
| // FIXME: The floor div in python is not the same as arith.divsi in negative | ||
| // scenarios. | ||
| mlir::Value mlirVal; |
There was a problem hiding this comment.
The removal of this FIXME is premature. mlir::arith::DivSIOp performs truncation towards zero, which does not match Python's floor division semantics for negative integers (e.g., -5 // 2 should be -3, but arith.divsi results in -2). While the float path now correctly uses math::FloorOp, the integer path remains incorrect for negative scenarios.
| """npuir floor at tile-level.""" | ||
| def npuir_floor(A, B): | ||
| return AscendUnaryOp("floor", A, B).buildTirCall() | ||
| """npuir floordiv at tile-level.""" | ||
| def npuir_floordiv(A, B, C): | ||
| return AscendBinaryOp("floordiv", A, B, C).buildTirCall() |
There was a problem hiding this comment.
The docstrings for npuir_floor and npuir_floordiv are placed outside the function definitions as standalone string literals. They should be moved inside the functions to be correctly recognized as docstrings.
| """npuir floor at tile-level.""" | |
| def npuir_floor(A, B): | |
| return AscendUnaryOp("floor", A, B).buildTirCall() | |
| """npuir floordiv at tile-level.""" | |
| def npuir_floordiv(A, B, C): | |
| return AscendBinaryOp("floordiv", A, B, C).buildTirCall() | |
| def npuir_floor(A, B): | |
| """npuir floor at tile-level.""" | |
| return AscendUnaryOp("floor", A, B).buildTirCall() | |
| def npuir_floordiv(A, B, C): | |
| """npuir floordiv at tile-level.""" | |
| return AscendBinaryOp("floordiv", A, B, C).buildTirCall() |
Created new npuir_floor and npuir_floordiv operation. Added new VfloordivCodegen and adding npuir_floor into UnaryVecOpCodegen.