clean up some runtime potential bugs (#19447)#19447
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19447
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 1cd3d16 with merge base 7127953 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@billmguo has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104615993. |
This PR needs a
|
|
@claude review |
|
Claude finished @lucylq's task in 1m 23s —— View job Review of PR #19447: clean up some runtime potential bugs
Overall: The changes are defensive improvements to zero-initialize memory in the static KV cache and input buffer. They address real potential issues with uninitialized/stale data. A few observations: 1. Zero cache on allocation (lines 67-68) ✅std::fill(cache_data_, cache_data_ + cache_data_size_, T(0));
std::fill(update_data_, update_data_ + update_data_size_, T(0));Good. The 2. Zero cache on reset (line 191) ✅std::fill(cache_data_, cache_data_ + cache_data_size_, T(0));This makes Nit: Should 3. Zero padding in last prefill chunk (lines 619-622) ✅if (batch_len < input_len) {
std::fill(
input_buffer.begin() + batch_len, input_buffer.end(), TokenT(0));
}Correct fix. When the last chunk of tokens is smaller than 4. PR description vs. actual diff mismatchThe PR description mentions "Call Summary: The three changes that are present look correct and low-risk. They add O(n) zeroing overhead on allocation, reset, and the last prefill chunk — negligible compared to model inference time. The main question is whether the |
9e0d34d to
a77301d
Compare
Summary: 1. Zero cache on allocation (line 65-66): std::fill on cache_data_ and update_data_ after allocator_.allocate() —eliminates uninitialized memory garbage that varies across devices. 2. Zero cache on reset (line 191): std::fill on cache_data_ in reset() — ensures stale KV cache from a previous prompt is fully cleared, not just the position counters. 3. Zero padding in last prefill chunk (line 618-621): When batch_len < input_len, fill the tail of the input buffer with zeros — prevents stale tokens from a previous chunk leaking through the embedding layer. sa_runner.cpp 4. Call runner.reset() before each prompt in the multi-prompt loop, stdin prompt loop, and stdin tokens loop —ensures the KV cache, masks, and input_pos_ are fully reset between prompts Reviewed By: Andriyluck, derekdixu Differential Revision: D104615993
Summary: 1. Zero cache on allocation (line 65-66): std::fill on cache_data_ and update_data_ after allocator_.allocate() —eliminates uninitialized memory garbage that varies across devices. 2. Zero cache on reset (line 191): std::fill on cache_data_ in reset() — ensures stale KV cache from a previous prompt is fully cleared, not just the position counters. 3. Zero padding in last prefill chunk (line 618-621): When batch_len < input_len, fill the tail of the input buffer with zeros — prevents stale tokens from a previous chunk leaking through the embedding layer. sa_runner.cpp 4. Call runner.reset() before each prompt in the multi-prompt loop, stdin prompt loop, and stdin tokens loop —ensures the KV cache, masks, and input_pos_ are fully reset between prompts Reviewed By: Andriyluck, derekdixu Differential Revision: D104615993
a77301d to
82a6b89
Compare
Summary: 1. Zero cache on allocation (line 65-66): std::fill on cache_data_ and update_data_ after allocator_.allocate() —eliminates uninitialized memory garbage that varies across devices. 2. Zero cache on reset (line 191): std::fill on cache_data_ in reset() — ensures stale KV cache from a previous prompt is fully cleared, not just the position counters. 3. Zero padding in last prefill chunk (line 618-621): When batch_len < input_len, fill the tail of the input buffer with zeros — prevents stale tokens from a previous chunk leaking through the embedding layer. sa_runner.cpp 4. Call runner.reset() before each prompt in the multi-prompt loop, stdin prompt loop, and stdin tokens loop —ensures the KV cache, masks, and input_pos_ are fully reset between prompts Reviewed By: Andriyluck, derekdixu Differential Revision: D104615993
82a6b89 to
1fbd4e4
Compare
Summary: 1. Zero cache on allocation (line 65-66): std::fill on cache_data_ and update_data_ after allocator_.allocate() —eliminates uninitialized memory garbage that varies across devices. 2. Zero cache on reset (line 191): std::fill on cache_data_ in reset() — ensures stale KV cache from a previous prompt is fully cleared, not just the position counters. 3. Zero padding in last prefill chunk (line 618-621): When batch_len < input_len, fill the tail of the input buffer with zeros — prevents stale tokens from a previous chunk leaking through the embedding layer. sa_runner.cpp 4. Call runner.reset() before each prompt in the multi-prompt loop, stdin prompt loop, and stdin tokens loop —ensures the KV cache, masks, and input_pos_ are fully reset between prompts Reviewed By: Andriyluck, derekdixu Differential Revision: D104615993
1fbd4e4 to
1cd3d16
Compare
Summary:
Reviewed By: Andriyluck, derekdixu
Differential Revision: D104615993