-
Notifications
You must be signed in to change notification settings - Fork 1.4k
FdbDecode memory issues fix #12456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FdbDecode memory issues fix #12456
Conversation
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question about the fix.
static std::unique_ptr<BlobStats> blobStats; | ||
static Future<Void> statsLogger; | ||
std::unique_ptr<BlobStats> blobStats; | ||
Future<Void> statsLogger; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the idea is to have a singleton BlobStats
and logger. Does this fix changes the behavior to have multiple BlobStats
and loggers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These stats are per S3BlobStoreEndPoint now. I checked the code and felt we have only 1 S3BlobStoreEndPoint. So assumed we should be fine. Correct me if I'm wrong.
If we have more than 1 S3BlobStoreEndPoint, we have to maintain multiple of such blobState and logger and can sometimes OOM too (if we cannot accommodate so much in memory). In that case I can retain them as static and can make changes in the destructors with additional checks for safe deletion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like in the fdbdecode
case, only one S3BlobStoreEndPoint
is created via IBackupContainer::openContainer(params->container_url, params->proxy, {});
.
I noticed openContainer()
is supposed to cache IBackupContainer
object. However, the code after the following block creates a new r
but not saving it to m_cache
. So if this is fixed, then opening the same URL multiple times will share the same IBackupContainer
.
Reference<IBackupContainer> IBackupContainer::openContainer(const std::string& url,
const Optional<std::string>& proxy,
const Optional<std::string>& encryptionKeyFileName) {
static std::map<std::string, Reference<IBackupContainer>> m_cache;
Reference<IBackupContainer>& r = m_cache[url];
if (r)
return r;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came across this bit of code and thought too that the new 'r' was not being added to the cache after creation but I think it is.... 'r' is a reference to the cache entry so later when we do this 'r = makeReference(...' we are inserting the new 'r' into the cache. What you two think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right! I missed Reference<IBackupContainer>&
part.
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
9133b8b
to
5ef949b
Compare
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
FdbDecode memory issues fix.
The fdbdecode command was throwing memory-related errors such as:
double free or corruption (!prev), free(): invalid pointer, munmap_chunk(): invalid pointer, Segmentation fault
These errors occurred only during the program’s shutdown phase, after all decoding work was completed. They did not affect the correctness of the decoded key–value output.
Root Cause
Valgrind analysis revealed that the crashes were caused by static object destruction order issues, leading to use-after-free and double-free situations.
Issue 1: EventCacheHolder
A static EventCacheHolder instance invoked clear() during its destruction, which accessed a LatestEventCache object that had already been destroyed.
Issue 2: BlobStats
Another static variable, BlobStats, owned an EventCacheHolder instance. During shutdown, its destruction triggered the same invalid access pattern described above.
100k Correctness passing:
20251022-184501-neethuhaneeshabingi-c75f24d2a60bc088 compressed=True data_size=51345406 duration=4751108 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=1:21:46 sanity=False started=100000 stopped=20251022-200647 submitted=20251022-184501 timeout=5400 username=neethuhaneeshabingi
Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branch
ormain
if this is the youngest branch)