[AscendNPU-IR] Support the Engram operator for DeepSeek V4#951
[AscendNPU-IR] Support the Engram operator for DeepSeek V4#951lzp1021 wants to merge 2 commits intotile-ai:npuirfrom
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
There was a problem hiding this comment.
Code Review
This pull request introduces forward and backward kernels for the engram gate operator implemented using tilelang for NPU targets. The implementation includes reference PyTorch versions and test scripts. Feedback focuses on optimizing the backward kernel by fusing redundant passes to reduce global memory access, removing hardcoded constants like head multipliers and persistent block counts for better flexibility, and moving shared memory allocations outside of loops to improve efficiency.
Support the Engram operators for DeepSeek V4