Skip to content

Commit

Permalink
Derive cache keys from SST unique IDs (facebook#10394)
Browse files Browse the repository at this point in the history
Summary:
... so that cache keys can be derived from DB manifest data
before reading the file from storage--so that every part of the file
can potentially go in a persistent cache.

See updated comments in cache_key.cc for technical details. Importantly,
the new cache key encoding uses some fancy but efficient math to pack
data into the cache key without depending on the sizes of the various
pieces. This simplifies some existing code creating cache keys, like
cache warming before the file size is known.

This should provide us an essentially permanent mapping between SST
unique IDs and base cache keys, with the ability to "upgrade" SST
unique IDs (and thus cache keys) with new SST format_versions.

These cache keys are of similar, perhaps indistinguishable quality to
the previous generation. Before this change (see "corrected" days
between collision):

```
./cache_bench -stress_cache_key -sck_keep_bits=43
18 collisions after 2 x 90 days, est 10 days between (1.15292e+19 corrected)
```

After this change (keep 43 bits, up through 50, to validate "trajectory"
is ok on "corrected" days between collision):
```
19 collisions after 3 x 90 days, est 14.2105 days between (1.63836e+19 corrected)
16 collisions after 5 x 90 days, est 28.125 days between (1.6213e+19 corrected)
15 collisions after 7 x 90 days, est 42 days between (1.21057e+19 corrected)
15 collisions after 17 x 90 days, est 102 days between (1.46997e+19 corrected)
15 collisions after 49 x 90 days, est 294 days between (2.11849e+19 corrected)
15 collisions after 62 x 90 days, est 372 days between (1.34027e+19 corrected)
15 collisions after 53 x 90 days, est 318 days between (5.72858e+18 corrected)
15 collisions after 309 x 90 days, est 1854 days between (1.66994e+19 corrected)
```

However, the change does modify (probably weaken) the "guaranteed unique" promise from this

> SST files generated in a single process are guaranteed to have unique cache keys, unless/until number session ids * max file number = 2**86

to this (see facebook#10388)

> With the DB id limitation, we only have nice guaranteed unique cache keys for files generated in a single process until biggest session_id_counter and offset_in_file reach combined 64 bits

I don't think this is a practical concern, though.

Pull Request resolved: facebook#10394

Test Plan: unit tests updated, see simulation results above

Reviewed By: jay-zhuang

Differential Revision: D38667529

Pulled By: pdillinger

fbshipit-source-id: 49af3fe7f47e5b61162809a78b76c769fd519fba
  • Loading branch information
pdillinger authored and facebook-github-bot committed Aug 12, 2022
1 parent 9fa5c14 commit 86a1e3e
Show file tree
Hide file tree
Showing 20 changed files with 673 additions and 525 deletions.
3 changes: 3 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@
* Improve read performance by avoiding dynamic memory allocation.
* When using iterators with the integrated BlobDB implementation, blob cache handles are now released immediately when the iterator's position changes.

## Behavior Change
* Block cache keys have changed, which will cause any persistent caches to miss between versions.

## 7.5.0 (07/15/2022)
### New Features
* Mempurge option flag `experimental_mempurge_threshold` is now a ColumnFamilyOptions and can now be dynamically configured using `SetOptions()`.
Expand Down
10 changes: 3 additions & 7 deletions cache/cache_bench_tool.cc
Original file line number Diff line number Diff line change
Expand Up @@ -806,7 +806,6 @@ class StressCacheKey {

uint64_t max_file_count =
uint64_t{FLAGS_sck_files_per_day} * FLAGS_sck_days_per_run;
uint64_t file_size = FLAGS_sck_file_size_mb * uint64_t{1024} * 1024U;
uint32_t report_count = 0;
uint32_t collisions_this_run = 0;
size_t db_i = 0;
Expand Down Expand Up @@ -834,8 +833,7 @@ class StressCacheKey {
}
bool is_stable;
BlockBasedTable::SetupBaseCacheKey(&dbs_[db_i], /* ignored */ "",
/* ignored */ 42, file_size, &ock,
&is_stable);
/* ignored */ 42, &ock, &is_stable);
assert(is_stable);
// Get a representative cache key, which later we analytically generalize
// to a range.
Expand All @@ -845,13 +843,11 @@ class StressCacheKey {
reduced_key = GetSliceHash64(ck.AsSlice()) >> shift_away;
} else if (FLAGS_sck_footer_unique_id) {
// Special case: keep only file number, not session counter
uint32_t a = DecodeFixed32(ck.AsSlice().data() + 4) >> shift_away_a;
uint32_t b = DecodeFixed32(ck.AsSlice().data() + 12) >> shift_away_b;
reduced_key = (uint64_t{a} << 32) + b;
reduced_key = DecodeFixed64(ck.AsSlice().data()) >> shift_away;
} else {
// Try to keep file number and session counter (shift away other bits)
uint32_t a = DecodeFixed32(ck.AsSlice().data()) << shift_away_a;
uint32_t b = DecodeFixed32(ck.AsSlice().data() + 12) >> shift_away_b;
uint32_t b = DecodeFixed32(ck.AsSlice().data() + 4) >> shift_away_b;
reduced_key = (uint64_t{a} << 32) + b;
}
if (reduced_key == 0) {
Expand Down
348 changes: 184 additions & 164 deletions cache/cache_key.cc

Large diffs are not rendered by default.

67 changes: 38 additions & 29 deletions cache/cache_key.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

#include "rocksdb/rocksdb_namespace.h"
#include "rocksdb/slice.h"
#include "table/unique_id_impl.h"

namespace ROCKSDB_NAMESPACE {

Expand All @@ -33,10 +34,10 @@ class CacheKey {
public:
// For convenience, constructs an "empty" cache key that is never returned
// by other means.
inline CacheKey() : session_etc64_(), offset_etc64_() {}
inline CacheKey() : file_num_etc64_(), offset_etc64_() {}

inline bool IsEmpty() const {
return (session_etc64_ == 0) & (offset_etc64_ == 0);
return (file_num_etc64_ == 0) & (offset_etc64_ == 0);
}

// Use this cache key as a Slice (byte order is endianness-dependent)
Expand All @@ -59,9 +60,9 @@ class CacheKey {

protected:
friend class OffsetableCacheKey;
CacheKey(uint64_t session_etc64, uint64_t offset_etc64)
: session_etc64_(session_etc64), offset_etc64_(offset_etc64) {}
uint64_t session_etc64_;
CacheKey(uint64_t file_num_etc64, uint64_t offset_etc64)
: file_num_etc64_(file_num_etc64), offset_etc64_(offset_etc64) {}
uint64_t file_num_etc64_;
uint64_t offset_etc64_;
};

Expand All @@ -85,50 +86,58 @@ class OffsetableCacheKey : private CacheKey {
inline OffsetableCacheKey() : CacheKey() {}

// Constructs an OffsetableCacheKey with the given information about a file.
// max_offset is based on file size (see WithOffset) and is required here to
// choose an appropriate (sub-)encoding. This constructor never generates an
// "empty" base key.
// This constructor never generates an "empty" base key.
OffsetableCacheKey(const std::string &db_id, const std::string &db_session_id,
uint64_t file_number, uint64_t max_offset);
uint64_t file_number);

// Creates an OffsetableCacheKey from an SST unique ID, so that cache keys
// can be derived from DB manifest data before reading the file from
// storage--so that every part of the file can potentially go in a persistent
// cache.
//
// Calling GetSstInternalUniqueId() on a db_id, db_session_id, and
// file_number and passing the result to this function produces the same
// base cache key as feeding those inputs directly to the constructor.
//
// This is a bijective transformation assuming either id is empty or
// lower 64 bits is non-zero:
// * Empty (all zeros) input -> empty (all zeros) output
// * Lower 64 input is non-zero -> lower 64 output (file_num_etc64_) is
// non-zero
static OffsetableCacheKey FromInternalUniqueId(UniqueIdPtr id);

// This is the inverse transformation to the above, assuming either empty
// or lower 64 bits (file_num_etc64_) is non-zero. Perhaps only useful for
// testing.
UniqueId64x2 ToInternalUniqueId();

inline bool IsEmpty() const {
bool result = session_etc64_ == 0;
bool result = file_num_etc64_ == 0;
assert(!(offset_etc64_ > 0 && result));
return result;
}

// Construct a CacheKey for an offset within a file, which must be
// <= max_offset provided in constructor. An offset is not necessarily a
// byte offset if a smaller unique identifier of keyable offsets is used.
// Construct a CacheKey for an offset within a file. An offset is not
// necessarily a byte offset if a smaller unique identifier of keyable
// offsets is used.
//
// This class was designed to make this hot code extremely fast.
inline CacheKey WithOffset(uint64_t offset) const {
assert(!IsEmpty());
assert(offset <= max_offset_);
return CacheKey(session_etc64_, offset_etc64_ ^ offset);
return CacheKey(file_num_etc64_, offset_etc64_ ^ offset);
}

// The "common prefix" is a shared prefix for all the returned CacheKeys,
// that also happens to usually be the same among many files in the same DB,
// so is efficient and highly accurate (not perfectly) for DB-specific cache
// dump selection (but not file-specific).
// The "common prefix" is a shared prefix for all the returned CacheKeys.
// It is specific to the file but the same for all offsets within the file.
static constexpr size_t kCommonPrefixSize = 8;
inline Slice CommonPrefixSlice() const {
static_assert(sizeof(session_etc64_) == kCommonPrefixSize,
static_assert(sizeof(file_num_etc64_) == kCommonPrefixSize,
"8 byte common prefix expected");
assert(!IsEmpty());
assert(&this->session_etc64_ == static_cast<const void *>(this));
assert(&this->file_num_etc64_ == static_cast<const void *>(this));

return Slice(reinterpret_cast<const char *>(this), kCommonPrefixSize);
}

// For any max_offset <= this value, the same encoding scheme is guaranteed.
static constexpr uint64_t kMaxOffsetStandardEncoding = 0xffffffffffU;

private:
#ifndef NDEBUG
uint64_t max_offset_ = 0;
#endif
};

} // namespace ROCKSDB_NAMESPACE
8 changes: 4 additions & 4 deletions cache/cache_reservation_manager_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,13 @@ TEST_F(CacheReservationManagerTest, GenerateCacheKey) {
// Next unique Cache key
CacheKey ckey = CacheKey::CreateUniqueForCacheLifetime(cache.get());
// Get to the underlying values
using PairU64 = std::array<uint64_t, 2>;
auto& ckey_pair = *reinterpret_cast<PairU64*>(&ckey);
uint64_t* ckey_data = reinterpret_cast<uint64_t*>(&ckey);
// Back it up to the one used by CRM (using CacheKey implementation details)
ckey_pair[1]--;
ckey_data[1]--;

// Specific key (subject to implementation details)
EXPECT_EQ(ckey_pair, PairU64({0, 2}));
EXPECT_EQ(ckey_data[0], 0);
EXPECT_EQ(ckey_data[1], 2);

Cache::Handle* handle = cache->Lookup(ckey.AsSlice());
EXPECT_NE(handle, nullptr)
Expand Down
Loading

0 comments on commit 86a1e3e

Please sign in to comment.