Skip to content

Conversation

@Colm-in-Arm
Copy link
Contributor

Description

Making cache objects of packed data thread_local rather than static.

Motivation and Context

Both LHS and RHS packing utilize a cache mechanism based on a static unordered map. There's the potential for interference between parallel inference sessions. Made both structures thread_local.

@hariharans29
Copy link
Member

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@hariharans29 hariharans29 requested a review from Copilot November 14, 2025 18:13
@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves thread safety in the Kleidiai convolution implementation by converting cache storage from static to thread_local scope. This prevents potential data races and interference when multiple inference sessions run in parallel threads.

Key changes:

  • RHS (weights) cache converted from static to thread_local
  • LHS (input indirection) cache converted from static to thread_local
  • Updated comments to explain the thread_local rationale

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hariharans29
Copy link
Member

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

hariharans29
hariharans29 previously approved these changes Nov 14, 2025
Copy link
Member

@hariharans29 hariharans29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@hariharans29
Copy link
Member

It will need this to fix the failing pipeline: #26559

edgchen1
edgchen1 previously approved these changes Nov 15, 2025
@hariharans29
Copy link
Member

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@hariharans29 hariharans29 enabled auto-merge (squash) November 17, 2025 21:55
auto-merge was automatically disabled November 18, 2025 20:34

Pull request was closed

@hariharans29 hariharans29 reopened this Nov 18, 2025
@hariharans29
Copy link
Member

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@hariharans29
Copy link
Member

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@hariharans29 hariharans29 reopened this Nov 18, 2025
@hariharans29 hariharans29 enabled auto-merge (squash) November 19, 2025 17:48
@hariharans29
Copy link
Member

/azp run Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

auto-merge was automatically disabled November 19, 2025 22:46

Pull request was closed

@hariharans29 hariharans29 reopened this Nov 19, 2025
* Both LHS and RHS packing utilize a cache mechanism based on a static
  unordered map. There's the potential for interference between parallel
  inference sessions. Made both structures thread_local.

Signed-off-by: Colm Donelan <[email protected]>
@hariharans29
Copy link
Member

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@hariharans29 hariharans29 reopened this Nov 20, 2025
@hariharans29 hariharans29 enabled auto-merge (squash) November 20, 2025 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants