Skip to content

[AscendNPU-IR][A5] A5 support for floorOp and floorDivOp#950

Open
Ismayil06 wants to merge 1 commit intotile-ai:npuir-devfrom
Ismayil06:a5-floordiv
Open

[AscendNPU-IR][A5] A5 support for floorOp and floorDivOp#950
Ismayil06 wants to merge 1 commit intotile-ai:npuir-devfrom
Ismayil06:a5-floordiv

Conversation

@Ismayil06
Copy link
Copy Markdown

Created new npuir_floor and npuir_floordiv operation. Added new VfloordivCodegen and adding npuir_floor into UnaryVecOpCodegen.

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run bash format.sh in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work!

🚀

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for floor and floordiv operations in the NPU IR backend, encompassing operator definitions, codegen implementations, and corresponding unit tests. The review identified several critical issues: the vector codegen for floor division lacks necessary broadcasting and sliced insertion logic, and the expression visitation for floating-point floor division incorrectly updates the expression map with intermediate results. Additionally, the integer floor division path still fails to match Python semantics for negative numbers, a test fixture name mismatch will cause execution failures, and docstrings in the Python wrapper need to be moved inside the function bodies.

Comment on lines +3410 to +3428
void CodeGenTileLangNPUIRDEV::VfloordivCodegen(const CallNode *op) {
tvm::tl::NpuirFloorDiv npuirop(op->args, this->vmap);
auto loc = builder.getUnknownLoc();

Value src0 = GenExtractSliceFromRegion(npuirop.src0, npuirop.src0_range);
Value src1 = GenExtractSliceFromRegion(npuirop.src1, npuirop.src1_range);

auto srcType = getElementTypeOrSelf(src0.getType());
Value result;

if (srcType.isa<FloatType>()) {
auto divResult = builder.create<mlir::arith::DivFOp>(loc, src0, src1);
result = builder.create<mlir::math::FloorOp>(loc, divResult);
} else {
result = builder.create<mlir::arith::DivSIOp>(loc, src0, src1);
}

SetVarValue(npuirop.dst, result);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The implementation of VfloordivCodegen is missing critical logic present in other vector operations:

  1. Broadcasting: MLIR arithmetic operations require operands to have matching shapes. If src0 and src1 have different shapes (e.g., one is a broadcasted dimension), this will fail. You should use broadcastOrTranspose to align operand shapes.
  2. Sliced Insertion: SetVarValue(npuirop.dst, result) overwrites the entire buffer variable mapping. If the operation is performed on a slice (as indicated by dst_range), it must use ReshapeCastAndInsertSlice to update only the relevant portion of the tensor and maintain SSA consistency.

Comment thread src/target/codegen_npuir_dev.cc Outdated
Comment on lines +767 to +769
auto divResult = BinaryOpCodegen<mlir::arith::DivFOp, std::nullptr_t>(op, nullptr,
lhs, rhs);
mlirVal = builder.create<mlir::math::FloorOp>(builder.getUnknownLoc(), divResult);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a correctness issue with how prim_expr_map is updated here. BinaryOpCodegen internally calls UpdatePrimExprMap(op, mlirVal), associating the FloorDivNode (op) with the intermediate arith::DivFOp result. Subsequent calls to MakeValue for the same expression will return the un-floored division result from the map. You should manually check the map, compute the result, and then update the map with the final mlirVal.

  } else if (op->dtype.is_float()) {
    auto result = CheckPrimExprMap(op);
    if (result.first) return result.second;
    auto divResult = builder.create<mlir::arith::DivFOp>(builder.getUnknownLoc(), lhs, rhs);
    mlirVal = builder.create<mlir::math::FloorOp>(builder.getUnknownLoc(), divResult);
    UpdatePrimExprMap(op, mlirVal);
  }

((1024, 16384), "float32"),
]
)
def floordiv_case(request):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The fixture name is incorrectly named floordiv_case in this file, but the test function test_floor_dev on line 59 expects a fixture named floor_case. This will cause the test to fail with a Fixture 'floor_case' not found error.

Suggested change
def floordiv_case(request):
def floor_case(request):

auto rhs = MakeValue(op->b);
// FIXME: The floor div in python is not the same as arith.divsi in negative
// scenarios.
mlir::Value mlirVal;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The removal of this FIXME is premature. mlir::arith::DivSIOp performs truncation towards zero, which does not match Python's floor division semantics for negative integers (e.g., -5 // 2 should be -3, but arith.divsi results in -2). While the float path now correctly uses math::FloorOp, the integer path remains incorrect for negative scenarios.

Comment on lines +230 to +235
"""npuir floor at tile-level."""
def npuir_floor(A, B):
return AscendUnaryOp("floor", A, B).buildTirCall()
"""npuir floordiv at tile-level."""
def npuir_floordiv(A, B, C):
return AscendBinaryOp("floordiv", A, B, C).buildTirCall()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstrings for npuir_floor and npuir_floordiv are placed outside the function definitions as standalone string literals. They should be moved inside the functions to be correctly recognized as docstrings.

Suggested change
"""npuir floor at tile-level."""
def npuir_floor(A, B):
return AscendUnaryOp("floor", A, B).buildTirCall()
"""npuir floordiv at tile-level."""
def npuir_floordiv(A, B, C):
return AscendBinaryOp("floordiv", A, B, C).buildTirCall()
def npuir_floor(A, B):
"""npuir floor at tile-level."""
return AscendUnaryOp("floor", A, B).buildTirCall()
def npuir_floordiv(A, B, C):
"""npuir floordiv at tile-level."""
return AscendBinaryOp("floordiv", A, B, C).buildTirCall()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant