-
Notifications
You must be signed in to change notification settings - Fork 3.5k
KFI-203 Improve thread safety of packing in convolve_kleidiai.cpp #26575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves thread safety in the Kleidiai convolution implementation by converting cache storage from static to thread_local scope. This prevents potential data races and interference when multiple inference sessions run in parallel threads.
Key changes:
- RHS (weights) cache converted from static to thread_local
- LHS (input indirection) cache converted from static to thread_local
- Updated comments to explain the thread_local rationale
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
hariharans29
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
|
It will need this to fix the failing pipeline: #26559 |
e6f7b21
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
|
/azp run Windows GPU Doc Gen CI Pipeline |
|
No pipelines are associated with this pull request. |
* Both LHS and RHS packing utilize a cache mechanism based on a static unordered map. There's the potential for interference between parallel inference sessions. Made both structures thread_local. Signed-off-by: Colm Donelan <[email protected]>
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
Description
Making cache objects of packed data thread_local rather than static.
Motivation and Context
Both LHS and RHS packing utilize a cache mechanism based on a static unordered map. There's the potential for interference between parallel inference sessions. Made both structures thread_local.