Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit badger gc concurrency to 1 to avoid panic #14340

Merged
merged 10 commits into from
Oct 21, 2024
1 change: 1 addition & 0 deletions changelogs/head.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ https://github.com/elastic/apm-server/compare/8.15\...main[View commits]
==== Bug fixes

- Track all bulk request response status codes {pull}13574[13574]
- Tail-based sampling: Fix rare gc thread failure after EA hot reload, causing unbounded storage size growth {pull}13574[13574]

[float]
==== Breaking Changes
Expand Down
7 changes: 7 additions & 0 deletions x-pack/apm-server/sampling/processor.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@ const (
shutdownGracePeriod = 5 * time.Second
)

var (
// gcMutex is a global mutex to protect gc from running concurrently when 2 TBS processors are active during a hot reload
gcMutex sync.Mutex
)

// Processor is a tail-sampling event processor.
type Processor struct {
config Config
Expand Down Expand Up @@ -386,6 +391,8 @@ func (p *Processor) Run() error {
}
})
g.Go(func() error {
gcMutex.Lock()
defer gcMutex.Unlock()
// This goroutine is responsible for periodically garbage
// collecting the Badger value log, using the recommended
// discard ratio of 0.5.
Expand Down
Loading