Skip to content

Conversation

@nsrip-dd
Copy link
Contributor

@nsrip-dd nsrip-dd commented Oct 21, 2025

The zstd compression library uses ~8MiB per compressor by default,
primarily for the back-reference window. See this parameter:
https://pkg.go.dev/github.com/klauspost/compress/zstd#WithWindowSize
Since we have an encoder per profile type, this leads to a noticable
increase in memory usage after switching to zstd by default. We can
make the window smaller, but this can negatively affect the compression
ratio. Instead, we can just use a single encoder and share it between
the profile types.

This PR does the bare minimum to implement a single encoder. It's a
bit kludgy to use a separate global lock to guard access to the encoder.
But it's awkward to plumb the synchronization around and keep it more
encapsulated without a bigger refactor.

This will probably make our cycle time slightly longer, since we now wait
for all the processing to complete serially before advancing to the next
profile cycle. It's hard to quantify exactly how much since it depends on
how much profiling data the program produces.

Also worth noting: the execution tracer and CPU profile APIs take a
writer when they start, rather than when we read the data. The tracer in
particular periodically writes out data as it's running. The CPU profiler
technically only writes data to the writer when it's stopped. Either way,
we don't want to hold the global lock in a way that would block either
of these things from completing. So this PR collects this data in a separate
buffer and (re)compresses with the lock held after stopping collection.
We should still come out ahead memory-usage wise by not using 8MiB
per profile type for compression.

@pr-commenter
Copy link

pr-commenter bot commented Oct 21, 2025

Benchmarks

Benchmark execution time: 2025-10-28 12:14:12

Comparing candidate commit a37f42c in PR branch PROF-12641-shared-zstd-encoder with baseline commit 399a58b in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 24 metrics, 0 unstable metrics.

@datadog-official
Copy link
Contributor

datadog-official bot commented Oct 21, 2025

⚠️ Tests

⚠️ Warnings

❄️ 1 New flaky test detected

TestWrapConsumerGroupHandler from github.com/DataDog/dd-trace-go/contrib/IBM/sarama/v2 (Datadog)
Failed

=== RUN   TestWrapConsumerGroupHandler
2025/10/28 12:02:02 Sarama consumer up and running!...
    consumer_group_test.go:146: Message claimed: value = test 1, timestamp = 2025-10-28 12:02:02.066 +0000 UTC, topic = IBM_sarama_TestWrapConsumerGroupHandler
    consumer_group_test.go:88: 
        	Error Trace:	/home/runner/work/dd-trace-go/dd-trace-go/contrib/IBM/sarama/consumer_group_test.go:88
        	Error:      	"[
        	            	name: kafka.produce
        	            	tags: map[string]interface {}{"_dd.base_service":"", "_dd.p.tid":"6900b0ba00000000", "_dd.profiling.enabled":0, "_dd.top_level":1, "component":"IBM/sarama", "language":"go", "messaging.destination.name":"IBM_sarama_TestWrapConsumerGroupHandler", "messaging.kafka.partition":0, "messaging.system":"kafka", "offset":0, "resource.name":"Produce Topic IBM_sarama_TestWrapConsumerGroupHandler", "service.name":"kafka", "span.kind":"producer", "span.name":"kafka.produce", "span.type":"queue"}
...

ℹ️ Info

🧪 All tests passed

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: a37f42c | Docs | Was this helpful? Give us feedback!

@nsrip-dd nsrip-dd force-pushed the PROF-12641-shared-zstd-encoder branch from cfe3f71 to 6e78e2e Compare October 24, 2025 12:33
The zstd compression library uses ~8MiB per compressor by default,
primarily for the back-reference window. See this parameter:
https://pkg.go.dev/github.com/klauspost/compress/zstd#WithWindowSize
Since we have an encoder per profile type, this leads to a noticable
increase in memory usage after switching to zstd by default.  We can
make the window smaller, but this can negatively affect the compression
ratio. Instead, we can just use a single encoder and share it between
the profile types.

This commit does the bare minimum to implement a single encoder. It's a
bit kludgy to use a separate global lock to guard access to the encoder.
But it's awkward to plumb the synchronization around and keep it more
encapsulated without a bigger refactor.
@nsrip-dd nsrip-dd force-pushed the PROF-12641-shared-zstd-encoder branch from 6e78e2e to a6be279 Compare October 27, 2025 15:56
@nsrip-dd nsrip-dd marked this pull request as ready for review October 27, 2025 16:06
@nsrip-dd nsrip-dd requested a review from a team as a code owner October 27, 2025 16:06
@nsrip-dd
Copy link
Contributor Author

/merge

@dd-devflow-routing-codex
Copy link

dd-devflow-routing-codex bot commented Oct 28, 2025

View all feedbacks in Devflow UI.

2025-10-28 12:47:03 UTC ℹ️ Start processing command /merge


2025-10-28 12:47:08 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in main is approximately 18m (p90).


2025-10-28 13:03:08 UTC ℹ️ MergeQueue: This merge request was merged

@dd-mergequeue dd-mergequeue bot merged commit 6854674 into main Oct 28, 2025
267 of 273 checks passed
@dd-mergequeue dd-mergequeue bot deleted the PROF-12641-shared-zstd-encoder branch October 28, 2025 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants