Add a new binary to help debug stress test failure. #14177

xingbowang · 2025-12-05T22:51:55Z

Summary:

Add a new replay_db_stress binary to help debug stress test failure.
Integrate the new replay_db_stress binary to automatically run after a stress test failure to see whether the issue is consistently reproducible.

Test Plan:

Sample output from existing test failure.

./db_stress_replay --db=/home/xbw/workspace/sandcastle/iter_diverged/dev/shm/rocksdb_test/rocksdb_crashtest_blackbox --op_logs="S 000000000000038D000000000000012B0000000000000299 *N*N*N* SFP 000000000000038D000000000000012B00000000000002A2 *P*P*P* SFP 000000000000038D000000000000012B000000000000029E *" --column_family="3"

Parsing op_logs...
Parsed 18 operations

Opening database: /home/xbw/workspace/sandcastle/iter_diverged/dev/shm/rocksdb_test/rocksdb_crashtest_blackbox
Attempting to load options from database...
Successfully loaded options from database
Using 10 column families from OPTIONS file

Database opened successfully (read-only mode)

Using column family: 3

=== Starting operation replay (18 operations) ===
ReadOptions: allow_unprepared_value=0, total_order_seek=1

[  0] Seek(000000000000038D000000000000012B0000000000000299)
      -> Valid: key=000000000000038D000000000000012B0000000000000299, value_size=32
[  1] PrepareValue()
      -> Valid: key=000000000000038D000000000000012B0000000000000299, value_size=32
[  2] Next()
      -> Valid: key=000000000000038D000000000000012B000000000000029A, value_size=96
[  3] PrepareValue()
      -> Valid: key=000000000000038D000000000000012B000000000000029A, value_size=96
[  4] Next()
      -> Valid: key=000000000000038D000000000000012B000000000000029B, value_size=32
[  5] PrepareValue()
      -> Valid: key=000000000000038D000000000000012B000000000000029B, value_size=32
[  6] Next()
      -> Valid: key=000000000000038D000000000000012B00000000000002A2, value_size=96
[  7] PrepareValue()
      -> Valid: key=000000000000038D000000000000012B00000000000002A2, value_size=96
[  8] SeekForPrev(000000000000038D000000000000012B00000000000002A2)
      -> Valid: key=000000000000038D000000000000012B00000000000002A2, value_size=96
[  9] PrepareValue()
      -> Valid: key=000000000000038D000000000000012B00000000000002A2, value_size=96
[ 10] Prev()
      -> Valid: key=000000000000038D000000000000012B000000000000029B, value_size=32
[ 11] PrepareValue()
      -> Valid: key=000000000000038D000000000000012B000000000000029B, value_size=32
[ 12] Prev()
      -> Valid: key=000000000000038D000000000000012B000000000000029A, value_size=96
[ 13] PrepareValue()
      -> Valid: key=000000000000038D000000000000012B000000000000029A, value_size=96
[ 14] Prev()
      -> Valid: key=000000000000038D000000000000012B0000000000000299, value_size=32
[ 15] PrepareValue()
      -> Valid: key=000000000000038D000000000000012B0000000000000299, value_size=32
[ 16] SeekForPrev(000000000000038D000000000000012B000000000000029E)
      -> Valid: key=000000000000038D000000000000012B000000000000029E, value_size=96
[ 17] PrepareValue()
      -> Valid: key=000000000000038D000000000000012B000000000000029E, value_size=96

=== Replay completed ===
Total operations: 18
Errors encountered: 0

Final iterator state:
  Valid: true
  Key: 000000000000038D000000000000012B000000000000029E
  Value size: 96

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: Add a new db_stress_replay binary to help debug stress test failure Test Plan: Unit test Reviewers: Subscribers: Tasks: Tags:

meta-codesync · 2025-12-05T22:52:27Z

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D88525321.

meta-codesync · 2025-12-06T14:18:12Z

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D88525321.

hx235 · 2025-12-08T19:12:10Z

While I do see the point of decreasing mental barrier by proving some off-the-shelf tool and making info extraction easier for debugging, we need to be careful to ensure the tool being well implemented so people don't end up debugging the tool for debugging. One thing is we may want to reduce code redundancy between parsing logic in the tool and printing logic in the test. We should have one ground truth. The other one is to keep read options used to scan thing in sync between your tool and the stress test. If the tool can reuse what's already used in your debugging process e.g, ldb, dump etc that are correct by themselves, that will be better.

hx235

Let's discuss more before adding tools to debug stress test just to ensure tools are done with minimum maintenance cost and maximum helpfulness

Add a new binary to help debug stress test failure

f55d01f

Summary: Add a new db_stress_replay binary to help debug stress test failure Test Plan: Unit test Reviewers: Subscribers: Tasks: Tags:

meta-cla bot added the CLA Signed label Dec 5, 2025

fix build

c486199

hx235 requested changes Dec 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a new binary to help debug stress test failure. #14177

Add a new binary to help debug stress test failure. #14177

xingbowang commented Dec 5, 2025

Uh oh!

meta-codesync bot commented Dec 5, 2025

Uh oh!

meta-codesync bot commented Dec 6, 2025

Uh oh!

hx235 commented Dec 8, 2025 •

edited

Loading

Uh oh!

hx235 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add a new binary to help debug stress test failure. #14177

Are you sure you want to change the base?

Add a new binary to help debug stress test failure. #14177

Conversation

xingbowang commented Dec 5, 2025

Uh oh!

meta-codesync bot commented Dec 5, 2025

Uh oh!

meta-codesync bot commented Dec 6, 2025

Uh oh!

hx235 commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hx235 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hx235 commented Dec 8, 2025 •

edited

Loading