Uses forked pkg/util/interner for dogstatsd workload #19990

scottopell · 2023-10-06T20:47:02Z

What does this PR do?

Replaces the dogstatsd string interner with a gc-friendlier version that doesn't require dumping the cache.

Motivation

Testing out differences in string interning.

Additional Notes

This will not be merged as-is as it copy-pastas most of pkg/util/intern instead of using it directly.

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

Reviewer's Checklist

pr-commenter · 2023-10-08T22:38:59Z

Bloop Bleep... Dogbot Here

Regression Detector Results

Run ID: 573b1dcc-4c51-4a14-ab1c-3f79bcea7dc6
Baseline: d60b9cf
Comparison: 95663e2
Total datadog-agent CPUs: 7

Explanation

A regression test is an integrated performance test for datadog-agent in a repeatable rig, with varying configuration for datadog-agent. What follows is a statistical summary of a brief datadog-agent run for each configuration across SHAs given above. The goal of these tests are to determine quickly if datadog-agent performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.
Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.

experiment	goal	Δ mean %	Δ mean % CI	confidence
otel_to_otel_logs	ingress throughput	+0.51	[-1.09, +2.11]	39.86%
file_tree	egress throughput	+0.44	[-1.88, +2.75]	24.24%
tcp_syslog_to_blackhole	ingress throughput	+0.20	[+0.06, +0.33]	98.25%
dogstatsd_string_interner_8MiB_10k	ingress throughput	+0.02	[-0.05, +0.09]	42.17%
uds_dogstatsd_to_api	ingress throughput	+0.02	[-0.21, +0.25]	10.39%
trace_agent_json	ingress throughput	+0.02	[-0.12, +0.15]	16.00%
dogstatsd_string_interner_8MiB_50k	ingress throughput	+0.01	[-0.03, +0.05]	24.91%
dogstatsd_string_interner_128MiB_100	ingress throughput	+0.00	[-0.14, +0.14]	0.77%
dogstatsd_string_interner_64MiB_100	ingress throughput	+0.00	[-0.14, +0.14]	0.38%
dogstatsd_string_interner_64MiB_1k	ingress throughput	+0.00	[-0.13, +0.13]	0.22%
dogstatsd_string_interner_8MiB_100k	ingress throughput	+0.00	[-0.07, +0.07]	0.00%
dogstatsd_string_interner_8MiB_1k	ingress throughput	-0.00	[-0.10, +0.10]	0.44%
dogstatsd_string_interner_8MiB_100	ingress throughput	-0.00	[-0.13, +0.13]	1.34%
idle	egress throughput	-0.01	[-2.97, +2.96]	0.26%
tcp_dd_logs_filter_exclude	ingress throughput	-0.01	[-0.09, +0.07]	16.66%
trace_agent_msgpack	ingress throughput	-0.03	[-0.16, +0.10]	32.07%
dogstatsd_string_interner_128MiB_1k	ingress throughput	-0.03	[-0.34, +0.27]	14.83%
file_to_blackhole	egress throughput	-0.22	[-0.67, +0.22]	59.39%

This commit extracts experiments from #19882 and #19990, expanding the throughput parameter across the range of concern. Note that at the lower end of the throughput range the full contexts will not be explored, meaning that throughput per second is the dominate metric here. Contexts per second we will need to derive from captured target telemetry. Intended to back ongoing experiments with the stringInterner. REF SMPTNG-12 Signed-off-by: Brian L. Troutwine <[email protected]>

blt · 2023-10-10T18:50:19Z

I've introduced #20036 to give us a broader range of experimental throughput. This analysis will concern itself with run-id 0e3112e1-e30a-4211-9b28-915b909d5219.

Experiment	CPU time	Lifetime Allocations	Heap Live Size	Mutex Time
`uds_dogstatsd_to_api_nodist_1KiB`	~0	~ -15k	0 Mb	0ms
`uds_dogstatsd_to_api_nodist_100MiB`	-27sec	-50M	+30MB	~ -1sec
`uds_dogstatsd_to_api_nodist_200MiB`	-28sec	-60M	-106Mb	+4sec

Compare to the raising of the default size this PR sees a similar reduction in lifetime allocations and a curious rise in heap live size in only one experiment, otherwise the drop in _200MiB heap live size is improved. If we consider capture data throughput is improved below threshold in the _200MiB case while in the _100Mib it remains constant as before, implying that although this change does allow more throughput through the Agent's throughput in these experiments is stable <= 100MiB but becomes unstable somewhere around 120MiB/sec, 125MiB/sec if this PR is merged. CPU utilization is improved in a manner similar to the default size raise PR.

I would like to see this run repeated with #20036 in its history. I see the results here as positive, an improvement over main.

This commit extracts experiments from #19882 and #19990, expanding the throughput parameter across the range of concern. Note that at the lower end of the throughput range the full contexts will not be explored, meaning that throughput per second is the dominate metric here. Contexts per second we will need to derive from captured target telemetry. Intended to back ongoing experiments with the stringInterner. REF SMPTNG-12 Signed-off-by: Brian L. Troutwine <[email protected]>

blt · 2023-10-11T17:09:21Z

This analysis will concern itself with run-id 96f38e65-f7ae-4073-8fab-dbb569b4e089.

Experiment	CPU time	Lifetime Allocations	Heap Live Size	Mutex Time
`uds_dogstatsd_to_api_nodist_1KiB`	~0	~ 0	0 Mb	- 8ms
`uds_dogstatsd_to_api_nodist_1MiB`	~0	~ -100k	-2 Mb	~0ms
`uds_dogstatsd_to_api_nodist_16MiB`	-2 sec	- 8M	+18 Mb	+100ms
`uds_dogstatsd_to_api_nodist_32MiB`	-3 sec	- 16M	-11 Mb	+ 1 sec
`uds_dogstatsd_to_api_nodist_100MiB`	-29 sec	-49M	-59Mb	+1 sec
`uds_dogstatsd_to_api_nodist_200MiB`	-30 sec	-59M	-146Mb	+1 sec

A consistent result. The low end is not pessimized and the high end is improved. Looking at the 16MiB case capture data does bear out the higher memory consumption. Throughput is improved, profile suggests the bulk of the greater heap is in the UDS listener and then partially in the string interner. I don't have a hypothesis for that result.

Compare to the capture results for #19882 here overall memory consumption, throughput gains, CPU reduction appear to be on-par with raising the default size of the main version of the cache.

This commit updates the environment flags on our experiments to ensure that profiling runs are fine grained through the whole of the run. Updated with reference to the ongoing investigation in #19990 REF SMPTNG-12 Signed-off-by: Brian L. Troutwine <[email protected]>

* Ensure profiling is fine-grained in the Regression Detector This commit updates the environment flags on our experiments to ensure that profiling runs are fine grained through the whole of the run. Updated with reference to the ongoing investigation in #19990 REF SMPTNG-12 Signed-off-by: Brian L. Troutwine <[email protected]> * CPUDURATION -> CPU_DURATION --------- Signed-off-by: Brian L. Troutwine <[email protected]> Co-authored-by: George Hahn <[email protected]>

remeh · 2023-10-16T10:11:38Z

@blt hijacking a bit the thread, sorry, but for a related question.

I was thinking about the Heap Live Size which appears to be slightly higher in one scenario in current PR (your comment right above) but also in in this comment in #19882 increasing the string interner size.

AFAIR, both implementations rely on the GC to use the equivalent of weak pointers/references, where the GC could possible collect them and that's OK. Wouldn't it mean that the moments the GC runs has an impact on the Heap Live Size result, that is, the moment the data is collected also has an impact? If this value is often sampled at a time where the GC hasn't intervened, you would possibly see differences between scenarios. Should we consider scenarios where the GC is manually kicked off before collecting samples?

blt · 2023-10-16T18:13:27Z

@blt hijacking a bit the thread, sorry, but for a related question.

It's an interesting question and apropos I think.

I was thinking about the Heap Live Size which appears to be slightly higher in one scenario in current PR (your comment right above) but also in in this comment in #19882 increasing the string interner size.

AFAIR, both implementations rely on the GC to use the equivalent of weak pointers/references, where the GC could possible collect them and that's OK. Wouldn't it mean that the moments the GC runs has an impact on the Heap Live Size result, that is, the moment the data is collected also has an impact?

It would have an impact, yeah.

If this value is often sampled at a time where the GC hasn't intervened, you would possibly see differences between scenarios. Should we consider scenarios where the GC is manually kicked off before collecting samples?

I'd be vaguely against that. Which is to say, I think the profiler is not quite the right tool for this and I could have called this out a little better. Because of the sampling problem, like you point out, we might sample the Agent before GC has kicked on but that is memory use that the user of Agent would actually see, so should we elide it by manually triggering GC before sampling? I think that would give us a different view, but it may be a skewed view. The right tool here is to examine the capture data which is itself sampled but at 1hz.

I guess my argument would be that a freshly GC'ed Agent isn't any less representative than one where the GC has yet to run. Both states would be observable to an Agent user and need thought about when deciding resource allocation etc. I do continue to have a hunch that focusing on lifetime allocations is probably the more valuable thing coming out of the profiler, but as @scottopell has pointed out we do want to be careful to avoid pessimizing observable RSS consumption, or at least do so with the understanding of better gains elsewhere.

blt · 2023-10-16T18:24:22Z

Which is an interesting alternative view. Check this out:

The CPU numbers with the new interner are consistently on-par or better especially as throughput increases but the memory consumption data is less rosy on the low-end. It might be that with the way sampling works the live heap size ought to be dropped out of consideration, and only referenced from capture data.

blt · 2023-10-16T18:33:22Z

Another view:

I think -- see DataDog/lading#702 and #19993 -- we should have a more clear notion of the number of contexts rolling through Agent in these experiments since the implementation is sensitive to that number, but these are neat results.

scottopell · 2023-10-17T21:02:04Z

IIRC the GC is run before the heap state is captured for the profiler.
Along the same lines, a concern of periodic tasks (ie, flush intervals, garbage collections) giving an un-representative view is what eventually led to #20081 . I believe this is more of a concern for cpu profiles compared to heap profiles due to the way heap profiling works in go (continuously accumulated) vs cpu profiling (run on-demand)

I'll rebase and kick off another run of this to include the changes from #20081

This commit replaces the _nodist variants of the `uds_dogstatsd_to_api` experiment with a set of parameterized string interner experiments, varying both total contexts and throughput along two scales. The goal here is to understand how, at 8MiB/sec throughput, the variation of contexts impacts the stringInterner hotspot and along throughput to achieve coverage on a small range of contexts. REF SMPTNG-12 REF #19852 REF #19882 REF #19990 Signed-off-by: Brian L. Troutwine <[email protected]>

* Add experiments focused on dogstatsd's string interner This commit replaces the _nodist variants of the `uds_dogstatsd_to_api` experiment with a set of parameterized string interner experiments, varying both total contexts and throughput along two scales. The goal here is to understand how, at 8MiB/sec throughput, the variation of contexts impacts the stringInterner hotspot and along throughput to achieve coverage on a small range of contexts. REF SMPTNG-12 REF #19852 REF #19882 REF #19990 Signed-off-by: Brian L. Troutwine <[email protected]> * correct target metrics path for Agent Signed-off-by: Brian L. Troutwine <[email protected]> --------- Signed-off-by: Brian L. Troutwine <[email protected]>

iksaif · 2023-10-20T08:40:13Z

comp/dogstatsd/server/intern.go

-	tlmSIRSize = telemetry.NewSimpleGauge("dogstatsd", "string_interner_entries",
-		"Number of entries in the string interner")
-	tlmSIRBytes = telemetry.NewSimpleGauge("dogstatsd", "string_interner_bytes",
-		"Number of bytes stored in the string interner")


This is really really valuable imformation, can we try to keep it?

yup, don't worry I'll make sure all existing telemetry is covered in a final version of this PR.

scottopell · 2023-10-23T16:12:10Z

Exploring a slightly different architecture in #20335

…nality

scottopell mentioned this pull request Oct 6, 2023

[SMP] Increase the DogStatsD default stringInterner size #19882

Closed

scottopell force-pushed the sopell/forked-string-interner branch from e216a98 to a9d5266 Compare October 8, 2023 20:46

blt mentioned this pull request Oct 10, 2023

[SMP] Add more _nodist uds experiments #20036

Merged

scottopell force-pushed the sopell/forked-string-interner branch from a9d5266 to 3185ebc Compare October 10, 2023 20:28

blt mentioned this pull request Oct 12, 2023

[SMP] Preallocate in counter metric flush #20078

Closed

blt mentioned this pull request Oct 12, 2023

Ensure profiling is fine-grained in the Regression Detector #20081

Merged

scottopell force-pushed the sopell/forked-string-interner branch from 3185ebc to 0e4392d Compare October 12, 2023 20:56

scottopell force-pushed the sopell/forked-string-interner branch from 0e4392d to 98115f5 Compare October 17, 2023 21:02

blt mentioned this pull request Oct 18, 2023

Add experiments focused on dogstatsd's string interner #20211

Merged

iksaif reviewed Oct 20, 2023

View reviewed changes

scottopell force-pushed the sopell/forked-string-interner branch from 98115f5 to 04f36b5 Compare October 23, 2023 16:03

scottopell added 3 commits October 23, 2023 12:36

initial draft of smp local regression invoke task

00e87d3

Cleans up the impl and adds a proof-of-concept dashboard link functio…

254cb8b

…nality

Adds a profiler run for the SMP regression

3fbc465

scottopell added 4 commits October 23, 2023 12:36

Update telemetry to collect logs and use new lading dogstats dgenerator

51da57d

Uses forked pkg/util/interner for dogstatsd workload

adfc190

Re-adds syncchronization, needed for the finalizer

14702bf

Corrects broken telemetry from rebase

95663e2

scottopell force-pushed the sopell/forked-string-interner branch from 04f36b5 to 95663e2 Compare October 23, 2023 17:25

scottopell closed this Mar 11, 2024

dd-devflow bot deleted the sopell/forked-string-interner branch April 24, 2024 00:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uses forked pkg/util/interner for dogstatsd workload #19990

Uses forked pkg/util/interner for dogstatsd workload #19990

scottopell commented Oct 6, 2023

pr-commenter bot commented Oct 8, 2023 •

edited

Loading

blt commented Oct 10, 2023

blt commented Oct 11, 2023 •

edited

Loading

remeh commented Oct 16, 2023

blt commented Oct 16, 2023 •

edited

Loading

blt commented Oct 16, 2023

blt commented Oct 16, 2023

scottopell commented Oct 17, 2023

iksaif Oct 20, 2023

scottopell Oct 23, 2023

scottopell commented Oct 23, 2023

Uses forked pkg/util/interner for dogstatsd workload #19990

Uses forked pkg/util/interner for dogstatsd workload #19990

Conversation

scottopell commented Oct 6, 2023

What does this PR do?

Motivation

Additional Notes

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

Reviewer's Checklist

pr-commenter bot commented Oct 8, 2023 • edited Loading

Bloop Bleep... Dogbot Here

Regression Detector Results

blt commented Oct 10, 2023

blt commented Oct 11, 2023 • edited Loading

remeh commented Oct 16, 2023

blt commented Oct 16, 2023 • edited Loading

blt commented Oct 16, 2023

blt commented Oct 16, 2023

scottopell commented Oct 17, 2023

iksaif Oct 20, 2023

Choose a reason for hiding this comment

scottopell Oct 23, 2023

Choose a reason for hiding this comment

scottopell commented Oct 23, 2023

pr-commenter bot commented Oct 8, 2023 •

edited

Loading

blt commented Oct 11, 2023 •

edited

Loading

blt commented Oct 16, 2023 •

edited

Loading