feat: compute tbs entry size before writing and flush to avoid limit error #12120

kruskall · 2023-11-29T01:07:41Z

Motivation/summary

Keep track of pending writes in readwrites and use an atomic counter for the overall size of pending writes in the backing storage.
Catch storage issues earlier and avoid losing events.
Estimate the size of an entry by hardcoding values from badger internal code.

Checklist

Update CHANGELOG.asciidoc
Update package changelog.yml (only if changes to apmpackage have been made)
Documentation has been updated

For functional changes, consider:

Is it observable through the addition of either logging or metrics?
Is its use being published in telemetry to enable product improvement?
Have system tests been added to avoid regression?

How to test these changes

Setup 1 APM server with TBS enabled and a low storage limit
Configure 1 APM agent to send a transaction containing spans in high throughput.
Traces may look incomplete with missing spans in the waterfall

Related issues

Closes #11857

…error

mergify · 2023-11-29T01:08:16Z

This pull request does not have a backport label. Could you fix it @kruskall? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-7.17 is the label to automatically backport to the 7.17 branch.
backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.

NOTE: backport-skip has been added to this pull request.

x-pack/apm-server/sampling/eventstorage/storage.go

axw · 2023-11-29T01:32:58Z

x-pack/apm-server/sampling/eventstorage/storage.go

 	err := rw.txn.Commit()
 	rw.txn = rw.s.db.NewTransaction(true)
 	rw.pendingWrites = 0
+	rw.pendingSize = defaultPendingSize


Like in ReadWriter.writeEntry, this should be coordinated across all ReadWriters.

I've changed this a bit.

IIUC when flushing we're just committing the current (readwriter) transaction which is independent from other ReadWriters so we need to track the total underlying storage pendingSize (sum of all the readwriters pendingSizes) but also the per-readwriter pendingSize.

When a ReadWriter flushes we reset its current pendingSize and subtract it from the storage pendingSize.

axw · 2023-11-29T01:34:46Z

x-pack/apm-server/sampling/eventstorage/storage.go

-		txn: s.db.NewTransaction(true),
+		s:           s,
+		txn:         s.db.NewTransaction(true),
+		pendingSize: defaultPendingSize,


Instead of defaulting to 1024, should we perhaps use s.db.Size()? And rather than calculating pendingSize, maybe we could track the remaining capacity? i.e. StorageLimitInBytes - Size()

I've removed the default 1024. For tracking I'd prefer to use pendingSize but I have no strong opinion on it

x-pack/apm-server/sampling/eventstorage/storage.go

Co-authored-by: Vishal Raj <[email protected]>

simitt · 2024-01-04T13:04:07Z

@kruskall I moved this to ms-8.13 - can you please pick this up again, the PR looks stale.

x-pack/apm-server/sampling/eventstorage/storage.go

carsonip

~~The code correctness is becoming harder to reason. Maybe refactor it a bit?~~

x-pack/apm-server/sampling/eventstorage/storage.go

x-pack/apm-server/sampling/processor_test.go

x-pack/apm-server/sampling/eventstorage/storage.go

carsonip

I've written a test for the error scenario:

func TestStorageLimitWritesPersisted(t *testing.T) {
	// Ensure that if WriteTraceEvent does not return an error,
	// the event is eventually written to storage.
	const storageLimitInBytes = int64(10000)
	tempdir := t.TempDir()
	opts := func() badger.Options {
		opts := badgerOptions()
		opts = opts.WithInMemory(false)
		opts = opts.WithDir(tempdir).WithValueDir(tempdir)
		return opts
	}

	// Open and close the database to create a non-empty value log file,
	// which will cause writes below to fail due to the storage limit being
	// exceeded. We would otherwise have to rely on Badger's one minute
	// timer to refresh the size.
	db := newBadgerDB(t, opts)
	db.Close()
	db = newBadgerDB(t, opts)

	store := eventstorage.New(db, eventstorage.ProtobufCodec{})

	rw1 := store.NewReadWriter()
	defer rw1.Close()

	rw2 := store.NewReadWriter()
	defer rw2.Close()

	write := func(rw *eventstorage.ReadWriter) error {
		traceID := uuid.Must(uuid.NewV4()).String()
		transactionID := uuid.Must(uuid.NewV4()).String()
		transaction := modelpb.APMEvent{Transaction: &modelpb.Transaction{Id: transactionID}}
		err := rw.WriteTraceEvent(traceID, transactionID, &transaction, eventstorage.WriterOpts{
			TTL:                 time.Minute,
			StorageLimitInBytes: storageLimitInBytes,
		})
		return err
	}

	var (
		rw1Written, rw2Written int
		wg                     sync.WaitGroup
	)
	wg.Add(2)
	go func() {
		for {
			err := write(rw1)
			if err != nil {
				assert.ErrorIs(t, err, eventstorage.ErrLimitReached)
				break
			}
			rw1Written++
		}
		wg.Done()
	}()

	go func() {
		for {
			err := write(rw2)
			if err != nil {
				assert.ErrorIs(t, err, eventstorage.ErrLimitReached)
				break
			}
			rw2Written++
		}
		wg.Done()
	}()
	wg.Wait()

	getEventCount := func() (count int) {
		opts := badger.DefaultIteratorOptions
		txn := db.NewTransaction(true)
		iter := txn.NewIterator(opts)
		defer iter.Close()
		for iter.Rewind(); iter.Valid(); iter.Next() {
			item := iter.Item()
			if item.IsDeletedOrExpired() {
				continue
			}
			switch item.UserMeta() {
			case 'e':
				count++
			default:
				// Unknown entry meta: ignore.
				continue
			}
		}
		return
	}

	assert.Equal(t, rw1Written+rw2Written, getEventCount())
}

this will fail before the PR and will pass with the fix:

=== RUN   TestStorageLimitWritesPersisted
    storage_test.go:405: 
        	Error Trace:	/home/carson/projects/apm-server/x-pack/apm-server/sampling/eventstorage/storage_test.go:405
        	Error:      	Not equal: 
        	            	expected: 5256598
        	            	actual  : 4925758
        	Test:       	TestStorageLimitWritesPersisted
--- FAIL: TestStorageLimitWritesPersisted (68.69s)

Expected :5256598
Actual   :4925758

=== RUN   TestStorageLimitWritesPersisted
--- PASS: TestStorageLimitWritesPersisted (0.04s)
PASS

x-pack/apm-server/sampling/eventstorage/storage.go

x-pack/apm-server/sampling/eventstorage/storage_test.go

kruskall · 2024-01-17T13:46:01Z

@carsonip I updated your test to print the error returned by err := write(rw1) and err := write(rw2)

Before this PR:

--- FAIL: TestStorageLimitWritesPersisted (75.54s)
    storage_test.go:336: 
                Error Trace:    /apm-server/x-pack/apm-server/sampling/eventstorage/storage_test.go:336
                                                        /usr/lib/go/src/runtime/asm_amd64.s:1650
                Error:          Received unexpected error:
                                failed to flush pending writes: configured storage limit reached (current: 7816729157, limit: 10000)
                Test:           TestStorageLimitWritesPersisted
                Messages:       configured storage limit reached
    storage_test.go:323: 
                Error Trace:    /apm-server/x-pack/apm-server/sampling/eventstorage/storage_test.go:323
                                                        /usr/lib/go/src/runtime/asm_amd64.s:1650
                Error:          Received unexpected error:
                                failed to flush pending writes: configured storage limit reached (current: 7816729157, limit: 10000)
                Test:           TestStorageLimitWritesPersisted
                Messages:       configured storage limit reached
    storage_test.go:367: 
                Error Trace:    /apm-server/x-pack/apm-server/sampling/eventstorage/storage_test.go:367
                Error:          Not equal: 
                                expected: 28609598
                                actual  : 24648084
                Test:           TestStorageLimitWritesPersisted

After this PR:

--- FAIL: TestStorageLimitWritesPersisted (0.01s)
    storage_test.go:371: 
                Error Trace:    /apm-server/x-pack/apm-server/sampling/eventstorage/storage_test.go:371
                                                        /usr/lib/go/src/runtime/asm_amd64.s:1650
                Error:          Received unexpected error:
                                configured storage limit reached (current: 10063, limit: 10000)
                Test:           TestStorageLimitWritesPersisted
                Messages:       configured storage limit reached
    storage_test.go:358: 
                Error Trace:    /apm-server/x-pack/apm-server/sampling/eventstorage/storage_test.go:358
                                                        /usr/lib/go/src/runtime/asm_amd64.s:1650
                Error:          Received unexpected error:
                                configured storage limit reached (current: 10200, limit: 10000)
                Test:           TestStorageLimitWritesPersisted
                Messages:       configured storage limit reached
    storage_test.go:403: 
FAIL

Note: because we are counting the entrySize before writing it, the "true" current usage size is actually current-entrySize. I update the error message to show that and here's the output:

With true-current:

--- FAIL: TestStorageLimitWritesPersisted (0.01s)
    storage_test.go:358: 
                Error Trace:    /apm-server/x-pack/apm-server/sampling/eventstorage/storage_test.go:358
                                                        /usr/lib/go/src/runtime/asm_amd64.s:1650
                Error:          Received unexpected error:
                                configured storage limit reached (current: 10200, true-current: 10063, limit: 10000)
                Test:           TestStorageLimitWritesPersisted
                Messages:       configured storage limit reached
    storage_test.go:371: 
                Error Trace:    /apm-server/x-pack/apm-server/sampling/eventstorage/storage_test.go:371
                                                        /usr/lib/go/src/runtime/asm_amd64.s:1650
                Error:          Received unexpected error:
                                configured storage limit reached (current: 10063, true-current: 9926, limit: 10000)
                Test:           TestStorageLimitWritesPersisted
                Messages:       configured storage limit reached
    storage_test.go:403: 
FAIL

I've added log lines to show what is going on:

(rw2)current: 199 | true-current: 62, pending: 179, lsm: 0, vlog: 20, entry: 137
(rw2)current: 336 | true-current: 199, pending: 316, lsm: 0, vlog: 20, entry: 137
(rw2)current: 473 | true-current: 336, pending: 453, lsm: 0, vlog: 20, entry: 137
(rw2)current: 610 | true-current: 473, pending: 590, lsm: 0, vlog: 20, entry: 137
(rw2)current: 747 | true-current: 610, pending: 727, lsm: 0, vlog: 20, entry: 137
(rw2)current: 884 | true-current: 747, pending: 864, lsm: 0, vlog: 20, entry: 137
(rw2)current: 1021 | true-current: 884, pending: 1001, lsm: 0, vlog: 20, entry: 137
(rw2)current: 1158 | true-current: 1021, pending: 1138, lsm: 0, vlog: 20, entry: 137
(rw2)current: 1295 | true-current: 1158, pending: 1275, lsm: 0, vlog: 20, entry: 137
(rw2)current: 1432 | true-current: 1295, pending: 1412, lsm: 0, vlog: 20, entry: 137
(rw2)current: 1569 | true-current: 1432, pending: 1549, lsm: 0, vlog: 20, entry: 137
(rw2)current: 1706 | true-current: 1569, pending: 1686, lsm: 0, vlog: 20, entry: 137
(rw2)current: 1843 | true-current: 1706, pending: 1823, lsm: 0, vlog: 20, entry: 137
(rw1)current: 1980 | true-current: 1843, pending: 1960, lsm: 0, vlog: 20, entry: 137
(rw1)current: 2117 | true-current: 1980, pending: 2097, lsm: 0, vlog: 20, entry: 137
(rw1)current: 2254 | true-current: 2117, pending: 2234, lsm: 0, vlog: 20, entry: 137
(rw1)current: 2391 | true-current: 2254, pending: 2371, lsm: 0, vlog: 20, entry: 137
(rw1)current: 2665 | true-current: 2528, pending: 2645, lsm: 0, vlog: 20, entry: 137
(rw1)current: 2802 | true-current: 2665, pending: 2782, lsm: 0, vlog: 20, entry: 137
(rw2)current: 2528 | true-current: 2391, pending: 2508, lsm: 0, vlog: 20, entry: 137
(rw1)current: 2939 | true-current: 2802, pending: 2919, lsm: 0, vlog: 20, entry: 137
(rw1)current: 3076 | true-current: 2939, pending: 3056, lsm: 0, vlog: 20, entry: 137
(rw1)current: 3213 | true-current: 3076, pending: 3193, lsm: 0, vlog: 20, entry: 137
(rw1)current: 3350 | true-current: 3213, pending: 3330, lsm: 0, vlog: 20, entry: 137
(rw1)current: 3487 | true-current: 3350, pending: 3467, lsm: 0, vlog: 20, entry: 137
(rw1)current: 3624 | true-current: 3487, pending: 3604, lsm: 0, vlog: 20, entry: 137
(rw1)current: 3761 | true-current: 3624, pending: 3741, lsm: 0, vlog: 20, entry: 137
(rw1)current: 3898 | true-current: 3761, pending: 3878, lsm: 0, vlog: 20, entry: 137
(rw1)current: 4035 | true-current: 3898, pending: 4015, lsm: 0, vlog: 20, entry: 137
(rw1)current: 4172 | true-current: 4035, pending: 4152, lsm: 0, vlog: 20, entry: 137
(rw1)current: 4309 | true-current: 4172, pending: 4289, lsm: 0, vlog: 20, entry: 137
(rw1)current: 4583 | true-current: 4446, pending: 4563, lsm: 0, vlog: 20, entry: 137
(rw1)current: 4720 | true-current: 4583, pending: 4700, lsm: 0, vlog: 20, entry: 137
(rw1)current: 4857 | true-current: 4720, pending: 4837, lsm: 0, vlog: 20, entry: 137
(rw1)current: 4994 | true-current: 4857, pending: 4974, lsm: 0, vlog: 20, entry: 137
(rw1)current: 5131 | true-current: 4994, pending: 5111, lsm: 0, vlog: 20, entry: 137
(rw1)current: 5268 | true-current: 5131, pending: 5248, lsm: 0, vlog: 20, entry: 137
(rw1)current: 5405 | true-current: 5268, pending: 5385, lsm: 0, vlog: 20, entry: 137
(rw1)current: 5542 | true-current: 5405, pending: 5522, lsm: 0, vlog: 20, entry: 137
(rw1)current: 5679 | true-current: 5542, pending: 5659, lsm: 0, vlog: 20, entry: 137
(rw1)current: 5816 | true-current: 5679, pending: 5796, lsm: 0, vlog: 20, entry: 137
(rw1)current: 5953 | true-current: 5816, pending: 5933, lsm: 0, vlog: 20, entry: 137
(rw1)current: 6090 | true-current: 5953, pending: 6070, lsm: 0, vlog: 20, entry: 137
(rw2)current: 4446 | true-current: 4309, pending: 4426, lsm: 0, vlog: 20, entry: 137
(rw2)current: 6364 | true-current: 6227, pending: 6344, lsm: 0, vlog: 20, entry: 137
(rw1)current: 6227 | true-current: 6090, pending: 6207, lsm: 0, vlog: 20, entry: 137
(rw2)current: 6501 | true-current: 6364, pending: 6481, lsm: 0, vlog: 20, entry: 137
(rw2)current: 6638 | true-current: 6501, pending: 6618, lsm: 0, vlog: 20, entry: 137
(rw2)current: 6775 | true-current: 6638, pending: 6755, lsm: 0, vlog: 20, entry: 137
(rw2)current: 7049 | true-current: 6912, pending: 7029, lsm: 0, vlog: 20, entry: 137
(rw1)current: 6912 | true-current: 6775, pending: 6892, lsm: 0, vlog: 20, entry: 137
(rw2)current: 7186 | true-current: 7049, pending: 7166, lsm: 0, vlog: 20, entry: 137
(rw2)current: 7460 | true-current: 7323, pending: 7440, lsm: 0, vlog: 20, entry: 137
(rw2)current: 7597 | true-current: 7460, pending: 7577, lsm: 0, vlog: 20, entry: 137
(rw2)current: 7734 | true-current: 7597, pending: 7714, lsm: 0, vlog: 20, entry: 137
(rw2)current: 7871 | true-current: 7734, pending: 7851, lsm: 0, vlog: 20, entry: 137
(rw2)current: 8008 | true-current: 7871, pending: 7988, lsm: 0, vlog: 20, entry: 137
(rw2)current: 8145 | true-current: 8008, pending: 8125, lsm: 0, vlog: 20, entry: 137
(rw1)current: 7323 | true-current: 7186, pending: 7303, lsm: 0, vlog: 20, entry: 137
(rw2)current: 8282 | true-current: 8145, pending: 8262, lsm: 0, vlog: 20, entry: 137
(rw1)current: 8419 | true-current: 8282, pending: 8399, lsm: 0, vlog: 20, entry: 137
(rw2)current: 8556 | true-current: 8419, pending: 8536, lsm: 0, vlog: 20, entry: 137
(rw2)current: 8830 | true-current: 8693, pending: 8810, lsm: 0, vlog: 20, entry: 137
(rw2)current: 8967 | true-current: 8830, pending: 8947, lsm: 0, vlog: 20, entry: 137
(rw2)current: 9104 | true-current: 8967, pending: 9084, lsm: 0, vlog: 20, entry: 137
(rw2)current: 9241 | true-current: 9104, pending: 9221, lsm: 0, vlog: 20, entry: 137
(rw1)current: 8693 | true-current: 8556, pending: 8673, lsm: 0, vlog: 20, entry: 137
(rw1)current: 9515 | true-current: 9378, pending: 9495, lsm: 0, vlog: 20, entry: 137
(rw1)current: 9652 | true-current: 9515, pending: 9632, lsm: 0, vlog: 20, entry: 137
(rw1)current: 9789 | true-current: 9652, pending: 9769, lsm: 0, vlog: 20, entry: 137
(rw1)current: 9926 | true-current: 9789, pending: 9906, lsm: 0, vlog: 20, entry: 137
(rw2)current: 9378 | true-current: 9241, pending: 9358, lsm: 0, vlog: 20, entry: 137
(rw1)FLUSH: 10063 | true-current: 9926, pending: 10043, lsm: 0, vlog: 20, entry: 137
(rw2)FLUSH: 10200 | true-current: 10063, pending: 10180, lsm: 0, vlog: 20, entry: 137

rw2 has true-current size of 10063 which happens because the two readwriter can race when writing an entry but because the size is an atomic int and a readwriter can write one entry in parallel the size is off by one entry: 9926+137=10063.

I think this is an acceptable tradeoff because of two reasons:

we are already being careful when estimating size (see the first flush which flushes before writing the entry because it would overflow the storage)
the previous behaviour was way worse and actually incorrect: the buffered writes weren't counted, leading to a huge overuse of storage

Co-authored-by: Carson Ip <[email protected]>

carsonip · 2024-01-17T14:34:22Z

I think this is an acceptable tradeoff because of two reasons:

we are already being careful when estimating size (see the first flush which flushes before writing the entry because it would overflow the storage)

The main reason is the badger size reconciliation happens per-minute. So that will end up be the greatest source of error (as in inaccuracy) in our storage limit check.

the previous behaviour was way worse and actually incorrect: the buffered writes weren't counted, leading to a huge overuse of storage

Agreed. We are definitely not making things worse here. A critical bug is being fixed. Initially I did not think about the overuse of storage but when I last tested it out last week, I realize it was dangerous.

carsonip

looks good. If you could update the PR description and respond to the comments, I think it should be good to go. @axw do you want to glance over it?

simitt · 2024-01-18T08:25:42Z

@kruskall one last nit from my side - please update the How to test this section, as it will be crucial to know for the tester, especially if we end up releasing this as patch release.

…error (#12120) * feat: compute tbs entry size before writing and flush to avoid limit error * fix: move pendingSize to storage and make it atomic * refactor: remove default pending size * fix: track readwriters pending size and resolve race conditions * lint: remove unused method * fix: add base transaction size and add more comments * test: fix storage limit test * lint: remove unused ErrLimitReached * fix: do not add entrySize twice Co-authored-by: Vishal Raj <[email protected]> * docs: add pendingSize comment * lint: fix linter issues * fix: respect storage limit * fix: handle 0 storage limit (unlimited) * fix: flush what we have before discarding the transaction * fix: do not discard txn twice * test: fix error message typo * fix: update pendingSize after flush and refactor for clarity * docs: update test comment Co-authored-by: Carson Ip <[email protected]> * docs: readd db.size comment --------- Co-authored-by: Vishal Raj <[email protected]> Co-authored-by: Carson Ip <[email protected]> (cherry picked from commit c8b7108)

…error (#12120) (#12452) * feat: compute tbs entry size before writing and flush to avoid limit error * fix: move pendingSize to storage and make it atomic * refactor: remove default pending size * fix: track readwriters pending size and resolve race conditions * lint: remove unused method * fix: add base transaction size and add more comments * test: fix storage limit test * lint: remove unused ErrLimitReached * fix: do not add entrySize twice Co-authored-by: Vishal Raj <[email protected]> * docs: add pendingSize comment * lint: fix linter issues * fix: respect storage limit * fix: handle 0 storage limit (unlimited) * fix: flush what we have before discarding the transaction * fix: do not discard txn twice * test: fix error message typo * fix: update pendingSize after flush and refactor for clarity * docs: update test comment Co-authored-by: Carson Ip <[email protected]> * docs: readd db.size comment --------- Co-authored-by: Vishal Raj <[email protected]> Co-authored-by: Carson Ip <[email protected]> (cherry picked from commit c8b7108) Co-authored-by: kruskall <[email protected]>

carsonip · 2024-01-18T14:46:05Z

~~@kruskall this is missing changelog. Sorry I missed it during the review.~~

carsonip · 2024-02-01T13:01:07Z

Testing notes

✔️ test-plan-ok

Tested with sampling rate 0.1 with 1KB TBS size limit under 2 ESS versions: 8.12.0 (affected by bug) and 8.12.1 (with bugfix)

Code snippet

Go agent code

package main

import (
	"context"
	"fmt"
	"go.elastic.co/apm/module/apmhttp/v2"
	"go.elastic.co/apm/v2"
	"golang.org/x/net/context/ctxhttp"
	"io/ioutil"
	"net"
	"net/http"
	"time"
)

var tracingClient = apmhttp.WrapClient(http.DefaultClient)

func serverHandler(w http.ResponseWriter, req *http.Request) {
	// Propagate the transaction context contained in req.Context().
	span, _ := apm.StartSpan(req.Context(), "serverHandler", "spantype")
	defer span.End()
	sleep(req.Context())
	resp, err := ctxhttp.Get(req.Context(), tracingClient, "https://httpbin.org/status/200")
	if err != nil {
		apm.CaptureError(req.Context(), err).Send()
		http.Error(w, "failed to query backend", 500)
		return
	}
	ioutil.ReadAll(resp.Body)
	sleep(req.Context())
	return
}

func sleep(ctx context.Context) {
	span, _ := apm.StartSpan(ctx, "sleepspan", "sleep")
	defer span.End()
	time.Sleep(100 * time.Millisecond)
}

const CONCURRENCY = 10

func main() {
	listener, err := net.Listen("tcp", "127.0.0.1:0")
	if err != nil {
		panic(err)
	}
	for i := 0; i < CONCURRENCY; i++ {
		go func() {
			for {
				//select {
				//case <-time.Tick(3 * time.Second):
				//}
				if _, err := http.Get(fmt.Sprintf("http://%s/", listener.Addr())); err != nil {
					panic(err)
				}
			}
		}()
	}
	if err := http.Serve(listener, apmhttp.Wrap(http.HandlerFunc(serverHandler))); err != http.ErrServerClosed {
		panic(err)
	}
}

Expected outcome

Traces should be complete like this:

8.12.0 affected by bug

Traces are sometimes incomplete under high throughput.

8.12.1 with TBS bug fix

Did not find any incomplete traces after scrolling through 50+ traces.

feat: compute tbs entry size before writing and flush to avoid limit …

1d22c37

…error

mergify bot added the backport-skip Skip notification from the automated backport with mergify label Nov 29, 2023

axw reviewed Nov 29, 2023

View reviewed changes

kruskall added 8 commits December 5, 2023 20:46

Merge branch 'main' into feat/discarded-txn

bb819bc

fix: move pendingSize to storage and make it atomic

a16a4ce

refactor: remove default pending size

613e58f

fix: track readwriters pending size and resolve race conditions

958ffc5

lint: remove unused method

c19c88e

fix: add base transaction size and add more comments

9c9cca2

test: fix storage limit test

40c17a2

lint: remove unused ErrLimitReached

25f0a4c

kruskall marked this pull request as ready for review December 5, 2023 21:19

kruskall requested a review from a team as a code owner December 5, 2023 21:19

kruskall requested a review from axw December 5, 2023 21:19

lahsivjar reviewed Dec 6, 2023

View reviewed changes

x-pack/apm-server/sampling/eventstorage/storage.go Outdated Show resolved Hide resolved

x-pack/apm-server/sampling/eventstorage/storage.go Show resolved Hide resolved

x-pack/apm-server/sampling/eventstorage/storage.go Show resolved Hide resolved

fix: do not add entrySize twice

52767f8

Co-authored-by: Vishal Raj <[email protected]>

kruskall added 6 commits January 4, 2024 14:23

Merge branch 'main' into feat/discarded-txn

de0230b

docs: add pendingSize comment

ac8f074

lint: fix linter issues

d6fece5

fix: respect storage limit

af8a170

fix: handle 0 storage limit (unlimited)

5842784

Merge branch 'main' into feat/discarded-txn

57c734b

carsonip reviewed Jan 5, 2024

View reviewed changes

x-pack/apm-server/sampling/eventstorage/storage.go Outdated Show resolved Hide resolved

fix: flush what we have before discarding the transaction

bc0188e

carsonip reviewed Jan 5, 2024

View reviewed changes

x-pack/apm-server/sampling/eventstorage/storage.go Outdated Show resolved Hide resolved

x-pack/apm-server/sampling/eventstorage/storage.go Outdated Show resolved Hide resolved

carsonip reviewed Jan 10, 2024

View reviewed changes

x-pack/apm-server/sampling/processor_test.go Outdated Show resolved Hide resolved

fix: do not discard txn twice

94aaa7e

Merge branch 'main' into feat/discarded-txn

4a351af

kruskall requested a review from carsonip January 11, 2024 12:30

carsonip reviewed Jan 11, 2024

View reviewed changes

x-pack/apm-server/sampling/eventstorage/storage.go Show resolved Hide resolved

x-pack/apm-server/sampling/eventstorage/storage.go Show resolved Hide resolved

x-pack/apm-server/sampling/eventstorage/storage.go Show resolved Hide resolved

carsonip reviewed Jan 12, 2024

View reviewed changes

x-pack/apm-server/sampling/eventstorage/storage.go Show resolved Hide resolved

x-pack/apm-server/sampling/eventstorage/storage.go Show resolved Hide resolved

x-pack/apm-server/sampling/eventstorage/storage_test.go Outdated Show resolved Hide resolved

kruskall and others added 2 commits January 17, 2024 14:46

Merge branch 'main' into feat/discarded-txn

5519367

docs: update test comment

a71b50d

Co-authored-by: Carson Ip <[email protected]>

kruskall requested a review from carsonip January 17, 2024 13:51

carsonip reviewed Jan 17, 2024

View reviewed changes

kruskall added 2 commits January 17, 2024 21:06

Merge branch 'main' into feat/discarded-txn

7113324

docs: readd db.size comment

22b613f

kruskall mentioned this pull request Jan 17, 2024

tbs code not setting entry TTL #12449

Closed

kruskall requested a review from carsonip January 17, 2024 20:14

carsonip approved these changes Jan 18, 2024

View reviewed changes

Merge branch 'main' into feat/discarded-txn

6bb59ca

kruskall added the backport-8.12 Automated backport with mergify label Jan 18, 2024

kruskall enabled auto-merge (squash) January 18, 2024 13:12

mergify bot removed the backport-skip Skip notification from the automated backport with mergify label Jan 18, 2024

kruskall merged commit c8b7108 into elastic:main Jan 18, 2024
10 of 11 checks passed

mergify bot mentioned this pull request Jan 18, 2024

[8.12] feat: compute tbs entry size before writing and flush to avoid limit error (backport #12120) #12452

Merged

kruskall deleted the feat/discarded-txn branch January 19, 2024 21:30

carsonip added test-plan test-plan-ok v8.12.1 labels Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: compute tbs entry size before writing and flush to avoid limit error #12120

feat: compute tbs entry size before writing and flush to avoid limit error #12120

kruskall commented Nov 29, 2023 •

edited

Loading

mergify bot commented Nov 29, 2023

axw Nov 29, 2023

kruskall Dec 5, 2023

axw Nov 29, 2023

kruskall Dec 5, 2023

simitt commented Jan 4, 2024

carsonip left a comment •

edited

Loading

carsonip left a comment

kruskall commented Jan 17, 2024

carsonip commented Jan 17, 2024

carsonip left a comment

simitt commented Jan 18, 2024

carsonip commented Jan 18, 2024 •

edited

Loading

carsonip commented Feb 1, 2024

feat: compute tbs entry size before writing and flush to avoid limit error #12120

feat: compute tbs entry size before writing and flush to avoid limit error #12120

Conversation

kruskall commented Nov 29, 2023 • edited Loading

Motivation/summary

Checklist

How to test these changes

Related issues

mergify bot commented Nov 29, 2023

axw Nov 29, 2023

Choose a reason for hiding this comment

kruskall Dec 5, 2023

Choose a reason for hiding this comment

axw Nov 29, 2023

Choose a reason for hiding this comment

kruskall Dec 5, 2023

Choose a reason for hiding this comment

simitt commented Jan 4, 2024

carsonip left a comment • edited Loading

Choose a reason for hiding this comment

carsonip left a comment

Choose a reason for hiding this comment

kruskall commented Jan 17, 2024

carsonip commented Jan 17, 2024

carsonip left a comment

Choose a reason for hiding this comment

simitt commented Jan 18, 2024

carsonip commented Jan 18, 2024 • edited Loading

carsonip commented Feb 1, 2024

Testing notes

Code snippet

Expected outcome

8.12.0 affected by bug

8.12.1 with TBS bug fix

kruskall commented Nov 29, 2023 •

edited

Loading

carsonip left a comment •

edited

Loading

carsonip commented Jan 18, 2024 •

edited

Loading