Skip to content

[AscendNPU-IR] Fix op document and tests.#878

Open
looo000ooong wants to merge 1 commit into
tile-ai:npuirfrom
looo000ooong:npuir
Open

[AscendNPU-IR] Fix op document and tests.#878
looo000ooong wants to merge 1 commit into
tile-ai:npuirfrom
looo000ooong:npuir

Conversation

@looo000ooong
Copy link
Copy Markdown

Fix op document and tests

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run bash format.sh in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work!

🚀

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the documentation for T.sync_block_wait by replacing the existing example with a more comprehensive implementation that demonstrates synchronization between Cube and Vector scopes. A review comment identified that the variable FFTS_FLAG_THRESHOLD was undefined in the example and suggested adding a definition to make the code self-contained.

Comment on lines +38 to +40
def simple_sync(M, N, block_M, block_N, dtype="float16", inner_dtype="float32"):
m_num = M // block_M
n_num = N // block_N
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable FFTS_FLAG_THRESHOLD is used in the example code (lines 64 and 72) but is not defined within the scope of the snippet. This will cause a NameError if the code is executed. Please define this constant in the example to make it self-contained.

Suggested change
def simple_sync(M, N, block_M, block_N, dtype="float16", inner_dtype="float32"):
m_num = M // block_M
n_num = N // block_N
def simple_sync(M, N, block_M, block_N, dtype="float16", inner_dtype="float32"):
m_num = M // block_M
n_num = N // block_N
FFTS_FLAG_THRESHOLD = 10

with T.rs("PIPE_MTE2"):
T.sync_block_wait(i)
T.copy(Workspace[cid * block_m, i * block_n], cross_kernel_ub)
for i in range(0, FFTS_FLAG_THRESHOLD):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this realization will cause deadlock

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified the sync logic.

Copy link
Copy Markdown
Collaborator

@CeleNewYear CeleNewYear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants