[Fix]optimic elementwise pytest#1024
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
There was a problem hiding this comment.
Code Review
This pull request increases the vectorization factor (VEC_NUM) from 1 to 2 across several test functions in the Ascend language testing suite and modifies the index calculation in createvecindex. The reviewer identified that the change to VEC_NUM leads to zero-sized buffer allocations in existing tests where block_M is 1. Additionally, the update to createvecindex incorrectly introduces a row-dependent offset, which contradicts the expected column-based indexing logic.
| n_num = N // block_N | ||
|
|
||
| VEC_NUM = 1 | ||
| VEC_NUM = 2 |
There was a problem hiding this comment.
While changing VEC_NUM to 2 enables vectorization, it breaks the existing tests test_createvecindex_ascendc and test_createvecindex_pto. These tests use block_M=1, which results in block_M // VEC_NUM being 0. This leads to a zero-sized buffer allocation for c_ub and will cause the test to fail.
Please update the test parameters in test_createvecindex_ascendc and test_createvecindex_pto to use a block_M value that is a multiple of 2 (e.g., block_M=2). Note that you might also need to adjust M and the reference logic in run_test_createvecindex accordingly, as it seems to have issues with M > 1.
| c_ub = T.alloc_ub((block_M // VEC_NUM, block_N), dtype) | ||
|
|
||
| T.tile.createvecindex(c_ub, firstValue) | ||
| T.tile.createvecindex(c_ub, firstValue + vid * block_M // VEC_NUM) |
There was a problem hiding this comment.
The reference logic in run_test_createvecindex expects the output tensor C to be filled with column indices, meaning C[i, j] should be equal to j + firstValue. The value should not depend on the row index.
By adding vid * block_M // VEC_NUM to firstValue, you are making the generated indices dependent on the row, which contradicts the test's expectation. The firstValue for createvecindex should be the same for all vectorized parts. The original implementation was correct.
| T.tile.createvecindex(c_ub, firstValue + vid * block_M // VEC_NUM) | |
| T.tile.createvecindex(c_ub, firstValue) |
No description provided.