-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: compute tbs entry size before writing and flush to avoid limit error #12120
Conversation
This pull request does not have a backport label. Could you fix it @kruskall? 🙏
NOTE: |
err := rw.txn.Commit() | ||
rw.txn = rw.s.db.NewTransaction(true) | ||
rw.pendingWrites = 0 | ||
rw.pendingSize = defaultPendingSize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like in ReadWriter.writeEntry
, this should be coordinated across all ReadWriters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this a bit.
IIUC when flushing we're just committing the current (readwriter) transaction which is independent from other ReadWriters so we need to track the total underlying storage pendingSize
(sum of all the readwriters pendingSize
s) but also the per-readwriter pendingSize
.
When a ReadWriter flushes we reset its current pendingSize
and subtract it from the storage pendingSize
.
txn: s.db.NewTransaction(true), | ||
s: s, | ||
txn: s.db.NewTransaction(true), | ||
pendingSize: defaultPendingSize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of defaulting to 1024, should we perhaps use s.db.Size()
? And rather than calculating pendingSize
, maybe we could track the remaining capacity? i.e. StorageLimitInBytes - Size()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the default 1024. For tracking I'd prefer to use pendingSize
but I have no strong opinion on it
Co-authored-by: Vishal Raj <[email protected]>
@kruskall I moved this to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code correctness is becoming harder to reason. Maybe refactor it a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've written a test for the error scenario:
func TestStorageLimitWritesPersisted(t *testing.T) {
// Ensure that if WriteTraceEvent does not return an error,
// the event is eventually written to storage.
const storageLimitInBytes = int64(10000)
tempdir := t.TempDir()
opts := func() badger.Options {
opts := badgerOptions()
opts = opts.WithInMemory(false)
opts = opts.WithDir(tempdir).WithValueDir(tempdir)
return opts
}
// Open and close the database to create a non-empty value log file,
// which will cause writes below to fail due to the storage limit being
// exceeded. We would otherwise have to rely on Badger's one minute
// timer to refresh the size.
db := newBadgerDB(t, opts)
db.Close()
db = newBadgerDB(t, opts)
store := eventstorage.New(db, eventstorage.ProtobufCodec{})
rw1 := store.NewReadWriter()
defer rw1.Close()
rw2 := store.NewReadWriter()
defer rw2.Close()
write := func(rw *eventstorage.ReadWriter) error {
traceID := uuid.Must(uuid.NewV4()).String()
transactionID := uuid.Must(uuid.NewV4()).String()
transaction := modelpb.APMEvent{Transaction: &modelpb.Transaction{Id: transactionID}}
err := rw.WriteTraceEvent(traceID, transactionID, &transaction, eventstorage.WriterOpts{
TTL: time.Minute,
StorageLimitInBytes: storageLimitInBytes,
})
return err
}
var (
rw1Written, rw2Written int
wg sync.WaitGroup
)
wg.Add(2)
go func() {
for {
err := write(rw1)
if err != nil {
assert.ErrorIs(t, err, eventstorage.ErrLimitReached)
break
}
rw1Written++
}
wg.Done()
}()
go func() {
for {
err := write(rw2)
if err != nil {
assert.ErrorIs(t, err, eventstorage.ErrLimitReached)
break
}
rw2Written++
}
wg.Done()
}()
wg.Wait()
getEventCount := func() (count int) {
opts := badger.DefaultIteratorOptions
txn := db.NewTransaction(true)
iter := txn.NewIterator(opts)
defer iter.Close()
for iter.Rewind(); iter.Valid(); iter.Next() {
item := iter.Item()
if item.IsDeletedOrExpired() {
continue
}
switch item.UserMeta() {
case 'e':
count++
default:
// Unknown entry meta: ignore.
continue
}
}
return
}
assert.Equal(t, rw1Written+rw2Written, getEventCount())
}
this will fail before the PR and will pass with the fix:
=== RUN TestStorageLimitWritesPersisted
storage_test.go:405:
Error Trace: /home/carson/projects/apm-server/x-pack/apm-server/sampling/eventstorage/storage_test.go:405
Error: Not equal:
expected: 5256598
actual : 4925758
Test: TestStorageLimitWritesPersisted
--- FAIL: TestStorageLimitWritesPersisted (68.69s)
Expected :5256598
Actual :4925758
=== RUN TestStorageLimitWritesPersisted
--- PASS: TestStorageLimitWritesPersisted (0.04s)
PASS
@carsonip I updated your test to print the error returned by Before this PR:
After this PR:
Note: because we are counting the entrySize before writing it, the "true" current usage size is actually With true-current:
I've added log lines to show what is going on:
rw2 has true-current size of 10063 which happens because the two readwriter can race when writing an entry but because the size is an atomic int and a readwriter can write one entry in parallel the size is off by one entry: 9926+137=10063. I think this is an acceptable tradeoff because of two reasons:
|
Co-authored-by: Carson Ip <[email protected]>
The main reason is the badger size reconciliation happens per-minute. So that will end up be the greatest source of error (as in inaccuracy) in our storage limit check.
Agreed. We are definitely not making things worse here. A critical bug is being fixed. Initially I did not think about the overuse of storage but when I last tested it out last week, I realize it was dangerous. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. If you could update the PR description and respond to the comments, I think it should be good to go. @axw do you want to glance over it?
@kruskall one last nit from my side - please update the How to test this section, as it will be crucial to know for the tester, especially if we end up releasing this as patch release. |
…error (#12120) * feat: compute tbs entry size before writing and flush to avoid limit error * fix: move pendingSize to storage and make it atomic * refactor: remove default pending size * fix: track readwriters pending size and resolve race conditions * lint: remove unused method * fix: add base transaction size and add more comments * test: fix storage limit test * lint: remove unused ErrLimitReached * fix: do not add entrySize twice Co-authored-by: Vishal Raj <[email protected]> * docs: add pendingSize comment * lint: fix linter issues * fix: respect storage limit * fix: handle 0 storage limit (unlimited) * fix: flush what we have before discarding the transaction * fix: do not discard txn twice * test: fix error message typo * fix: update pendingSize after flush and refactor for clarity * docs: update test comment Co-authored-by: Carson Ip <[email protected]> * docs: readd db.size comment --------- Co-authored-by: Vishal Raj <[email protected]> Co-authored-by: Carson Ip <[email protected]> (cherry picked from commit c8b7108)
…error (#12120) (#12452) * feat: compute tbs entry size before writing and flush to avoid limit error * fix: move pendingSize to storage and make it atomic * refactor: remove default pending size * fix: track readwriters pending size and resolve race conditions * lint: remove unused method * fix: add base transaction size and add more comments * test: fix storage limit test * lint: remove unused ErrLimitReached * fix: do not add entrySize twice Co-authored-by: Vishal Raj <[email protected]> * docs: add pendingSize comment * lint: fix linter issues * fix: respect storage limit * fix: handle 0 storage limit (unlimited) * fix: flush what we have before discarding the transaction * fix: do not discard txn twice * test: fix error message typo * fix: update pendingSize after flush and refactor for clarity * docs: update test comment Co-authored-by: Carson Ip <[email protected]> * docs: readd db.size comment --------- Co-authored-by: Vishal Raj <[email protected]> Co-authored-by: Carson Ip <[email protected]> (cherry picked from commit c8b7108) Co-authored-by: kruskall <[email protected]>
|
Testing notes✔️ test-plan-ok Tested with sampling rate 0.1 with 1KB TBS size limit under 2 ESS versions: 8.12.0 (affected by bug) and 8.12.1 (with bugfix) Code snippetGo agent code
Expected outcomeTraces should be complete like this: 8.12.0 affected by bugTraces are sometimes incomplete under high throughput. 8.12.1 with TBS bug fixDid not find any incomplete traces after scrolling through 50+ traces. |
Motivation/summary
Keep track of pending writes in readwrites and use an atomic counter for the overall size of pending writes in the backing storage.
Catch storage issues earlier and avoid losing events.
Estimate the size of an entry by hardcoding values from badger internal code.
Checklist
apmpackage
have been made)For functional changes, consider:
How to test these changes
Related issues
Closes #11857