[action] [PR:4064] Fixing state_db not having delete_field attribute causing a crash when DPUs in bad state #246
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What I did
Fixed the AttributeError caused by missing delete_field method in the StateDBHelper class when managing DPU state transitions. The code was attempting to call state_db.delete_field() to remove the 'transition_start_time' field from the database, but this method didn't exist, causing crashes when DPUs were in bad state.
How I did it
Added the missing delete_field method to the StateDBHelper class that properly removes fields from Redis using the hdel command
Maintained the existing logic in set_state_transition_in_progress that removes 'transition_start_time' from both local state and database when transitioning to 'False'
Ensured consistency between local dictionary state and database state by properly implementing field deletion
The fix addresses the root cause of the AttributeError while preserving the intended behavior of cleaning up transition timestamps when state transitions complete.
Code Changes:
Added delete_field(self, table, key, field) method to StateDBHelper class
Method uses client.hdel(redis_key, field) to properly delete fields from Redis database
Existing call to state_db.delete_field() on line 85 now works correctly
How to verify it
Make the DPU midplane unreachable for one of the DPUs
Toggle the DPU ON/OFF state a couple of times using config chassis modules startup/shutdown DPUx
Verify that the commands complete without AttributeError crashes
Confirm that 'transition_start_time' field is properly removed from STATE_DB when transitions complete
Check that both local state and database state remain synchronized
Why this approach vs. removing the line:
This fix maintains the original intended behavior of cleaning up transition timestamps, ensuring database consistency and preventing stale data accumulation, while properly implementing the missing functionality that was causing the crash.
Which release branch to backport (provide reason below if selected)
[x] 202505
[x] 202506