Skip to content

Commit

Permalink
Support parallel read and write/delete to same key in NonBatchedOpsSt…
Browse files Browse the repository at this point in the history
…ressTest (facebook#11058)

Summary:
**Context:**
Current `NonBatchedOpsStressTest` does not allow multi-thread read (i.e, Get, Iterator) and write (i.e, Put, Merge) or delete to the same key. Every read or write/delete operation will acquire lock (`GetLocksForKeyRange`) on the target key to gain exclusive access to it. This does not align with RocksDB's nature of allowing multi-thread read and write/delete to the same key, that is concurrent threads can issue read/write/delete to RocksDB without external locking. Therefore this is a gap in our testing coverage.

To close the gap, biggest challenge remains in verifying db value against expected state in presence of parallel read and write/delete. The challenge is due to read/write/delete to the db and read/write to expected state is not within one atomic operation. Therefore we may not know the exact expected state of a certain db read, as by the time we read the expected state for that db read, another write to expected state for another db write to the same key might have changed the expected state.

**Summary:**
Credited to ajkr's idea, we now solve this challenge by breaking the 32-bits expected value of a key into different parts that can be read and write to in parallel.

Basically we divide the 32-bits expected value into `value_base` (corresponding to the previous whole 32 bits but now with some shrinking in the value base range we allow), `pending_write` (i.e, whether there is an ongoing concurrent write), `del_counter` (i.e, number of times a value has been deleted, analogous to value_base for write), `pending_delete` (similar to pending_write) and `deleted` (i.e whether a key is deleted).

Also, we need to use incremental `value_base` instead of random value base as before because we want to control the range of value base a correct db read result can possibly be in presence of parallel read and write. In that way, we can verify the correctness of the read against expected state more easily. This is at the cost of reducing the randomness of the value generated in NonBatchedOpsStressTest we are willing to accept.

(For detailed algorithm of how to use these parts to infer expected state of a key, see the PR)

Misc: hide value_base detail from callers of ExpectedState by abstracting related logics into ExpectedValue class

Pull Request resolved: facebook#11058

Test Plan:
- Manual test of small number of keys (i.e, high chances of parallel read and write/delete to same key) with equally distributed read/write/deleted for 30 min
```
python3 tools/db_crashtest.py --simple {blackbox|whitebox} --sync_fault_injection=1 --skip_verifydb=0 --continuous_verification_interval=1000 --clear_column_family_one_in=0 --max_key=10 --column_families=1 --threads=32 --readpercent=25 --writepercent=25 --nooverwritepercent=0 --iterpercent=25 --verify_iterator_with_expected_state_one_in=1 --num_iterations=5 --delpercent=15 --delrangepercent=10 --range_deletion_width=5 --use_merge={0|1} --use_put_entity_one_in=0 --use_txn=0 --verify_before_write=0 --user_timestamp_size=0 --compact_files_one_in=1000 --compact_range_one_in=1000 --flush_one_in=1000 --get_property_one_in=1000 --ingest_external_file_one_in=100 --backup_one_in=100 --checkpoint_one_in=100 --approximate_size_one_in=0 --acquire_snapshot_one_in=100 --use_multiget=0 --prefixpercent=0 --get_live_files_one_in=1000 --manual_wal_flush_one_in=1000 --pause_background_one_in=1000 --target_file_size_base=524288 --write_buffer_size=524288 --verify_checksum_one_in=1000 --verify_db_one_in=1000
```
- Rehearsal stress test for normal parameter and aggressive parameter to see if such change can find what existing stress test can find (i.e, no regression in testing capability)
- [Ongoing]Try to find new bugs with this change that are not found by current NonBatchedOpsStressTest with no parallel read and write/delete to same key

Reviewed By: ajkr

Differential Revision: D42257258

Pulled By: hx235

fbshipit-source-id: e6fdc18f1fad3753e5ac91731483a644d9b5b6eb
  • Loading branch information
hx235 authored and facebook-github-bot committed May 15, 2023
1 parent fb636f2 commit 5fc57ee
Show file tree
Hide file tree
Showing 6 changed files with 674 additions and 231 deletions.
3 changes: 1 addition & 2 deletions db_stress_tool/batched_ops_stress.cc
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,7 @@ class BatchedOpsStressTest : public StressTest {

const std::string key_body = Key(rand_keys[0]);

const uint32_t value_base =
thread->rand.Next() % thread->shared->UNKNOWN_SENTINEL;
const uint32_t value_base = thread->rand.Next();
const size_t sz = GenerateValue(value_base, value, sizeof(value));
const std::string value_body = Slice(value, sz).ToString();

Expand Down
76 changes: 43 additions & 33 deletions db_stress_tool/db_stress_shared_state.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,6 @@ class StressTest;
// State shared by all concurrent executions of the same benchmark.
class SharedState {
public:
// indicates a key may have any value (or not be present) as an operation on
// it is incomplete.
static constexpr uint32_t UNKNOWN_SENTINEL = 0xfffffffe;
// indicates a key should definitely be deleted
static constexpr uint32_t DELETION_SENTINEL = 0xffffffff;

// Errors when reading filter blocks are ignored, so we use a thread
// local variable updated via sync points to keep track of errors injected
// while reading filter blocks in order to ignore the Get/MultiGet result
Expand Down Expand Up @@ -254,54 +248,70 @@ class SharedState {
return expected_state_manager_->ClearColumnFamily(cf);
}

// @param pending True if the update may have started but is not yet
// guaranteed finished. This is useful for crash-recovery testing when the
// process may crash before updating the expected values array.
// Prepare a Put that will be started but not finish yet
// This is useful for crash-recovery testing when the process may crash
// before updating the corresponding expected value
//
// Requires external locking covering `key` in `cf`.
void Put(int cf, int64_t key, uint32_t value_base, bool pending) {
return expected_state_manager_->Put(cf, key, value_base, pending);
// Requires external locking covering `key` in `cf` to prevent concurrent
// write or delete to the same `key`.
PendingExpectedValue PreparePut(int cf, int64_t key) {
return expected_state_manager_->PreparePut(cf, key);
}

// Requires external locking covering `key` in `cf`.
uint32_t Get(int cf, int64_t key) const {
// Does not requires external locking.
ExpectedValue Get(int cf, int64_t key) {
return expected_state_manager_->Get(cf, key);
}

// @param pending See comment above Put()
// Returns true if the key was not yet deleted.
// Prepare a Delete that will be started but not finish yet
// This is useful for crash-recovery testing when the process may crash
// before updating the corresponding expected value
//
// Requires external locking covering `key` in `cf`.
bool Delete(int cf, int64_t key, bool pending) {
return expected_state_manager_->Delete(cf, key, pending);
// Requires external locking covering `key` in `cf` to prevent concurrent
// write or delete to the same `key`.
PendingExpectedValue PrepareDelete(int cf, int64_t key) {
return expected_state_manager_->PrepareDelete(cf, key);
}

// @param pending See comment above Put()
// Returns true if the key was not yet deleted.
//
// Requires external locking covering `key` in `cf`.
bool SingleDelete(int cf, int64_t key, bool pending) {
return expected_state_manager_->Delete(cf, key, pending);
// Requires external locking covering `key` in `cf` to prevent concurrent
// write or delete to the same `key`.
PendingExpectedValue PrepareSingleDelete(int cf, int64_t key) {
return expected_state_manager_->PrepareSingleDelete(cf, key);
}

// @param pending See comment above Put()
// Returns number of keys deleted by the call.
//
// Requires external locking covering keys in `[begin_key, end_key)` in `cf`.
int DeleteRange(int cf, int64_t begin_key, int64_t end_key, bool pending) {
return expected_state_manager_->DeleteRange(cf, begin_key, end_key,
pending);
// Requires external locking covering keys in `[begin_key, end_key)` in `cf`
// to prevent concurrent write or delete to the same `key`.
std::vector<PendingExpectedValue> PrepareDeleteRange(int cf,
int64_t begin_key,
int64_t end_key) {
return expected_state_manager_->PrepareDeleteRange(cf, begin_key, end_key);
}

bool AllowsOverwrite(int64_t key) const {
return no_overwrite_ids_.find(key) == no_overwrite_ids_.end();
}

// Requires external locking covering `key` in `cf`.
// Requires external locking covering `key` in `cf` to prevent concurrent
// delete to the same `key`.
bool Exists(int cf, int64_t key) {
return expected_state_manager_->Exists(cf, key);
}

// Sync the `value_base` to the corresponding expected value
void SyncPut(int cf, int64_t key, uint32_t value_base) {
return expected_state_manager_->SyncPut(cf, key, value_base);
}

// Sync the corresponding expected value to be pending Put
void SyncPendingPut(int cf, int64_t key) {
return expected_state_manager_->SyncPendingPut(cf, key);
}

// Sync the corresponding expected value to be deleted
void SyncDelete(int cf, int64_t key) {
return expected_state_manager_->SyncDelete(cf, key);
}

uint32_t GetSeed() const { return seed_; }

void SetShouldStopBgThread() { should_stop_bg_thread_ = true; }
Expand Down
10 changes: 5 additions & 5 deletions db_stress_tool/db_stress_test_base.cc
Original file line number Diff line number Diff line change
Expand Up @@ -483,12 +483,13 @@ void StressTest::PreloadDbAndReopenAsReadOnly(int64_t number_of_keys,
for (int64_t k = 0; k != number_of_keys; ++k) {
const std::string key = Key(k);

constexpr uint32_t value_base = 0;
PendingExpectedValue pending_expected_value =
shared->PreparePut(cf_idx, k);
const uint32_t value_base = pending_expected_value.GetFinalValueBase();
const size_t sz = GenerateValue(value_base, value, sizeof(value));

const Slice v(value, sz);

shared->Put(cf_idx, k, value_base, true /* pending */);

std::string ts;
if (FLAGS_user_timestamp_size > 0) {
Expand Down Expand Up @@ -534,7 +535,7 @@ void StressTest::PreloadDbAndReopenAsReadOnly(int64_t number_of_keys,
}
}

shared->Put(cf_idx, k, value_base, false /* pending */);
pending_expected_value.Commit();
if (!s.ok()) {
break;
}
Expand Down Expand Up @@ -614,8 +615,7 @@ void StressTest::ProcessRecoveredPreparedTxnsHelper(Transaction* txn,
for (wbwi_iter->SeekToFirst(); wbwi_iter->Valid(); wbwi_iter->Next()) {
uint64_t key_val;
if (GetIntVal(wbwi_iter->Entry().key.ToString(), &key_val)) {
shared->Put(static_cast<int>(i) /* cf_idx */, key_val,
0 /* value_base */, true /* pending */);
shared->SyncPendingPut(static_cast<int>(i) /* cf_idx */, key_val);
}
}
}
Expand Down
Loading

0 comments on commit 5fc57ee

Please sign in to comment.