Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uses forked pkg/util/interner for dogstatsd workload #19990

Closed
wants to merge 7 commits into from

Conversation

scottopell
Copy link
Contributor

What does this PR do?

Replaces the dogstatsd string interner with a gc-friendlier version that doesn't require dumping the cache.

Motivation

Testing out differences in string interning.

Additional Notes

This will not be merged as-is as it copy-pastas most of pkg/util/intern instead of using it directly.

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

Reviewer's Checklist

  • If known, an appropriate milestone has been selected; otherwise the Triage milestone is set.
  • Use the major_change label if your change either has a major impact on the code base, is impacting multiple teams or is changing important well-established internals of the Agent. This label will be use during QA to make sure each team pay extra attention to the changed behavior. For any customer facing change use a releasenote.
  • A release note has been added or the changelog/no-changelog label has been applied.
  • Changed code has automated tests for its functionality.
  • Adequate QA/testing plan information is provided if the qa/skip-qa label is not applied.
  • At least one team/.. label has been applied, indicating the team(s) that should QA this change.
  • If applicable, docs team has been notified or an issue has been opened on the documentation repo.
  • If applicable, the need-change/operator and need-change/helm labels have been applied.
  • If applicable, the k8s/<min-version> label, indicating the lowest Kubernetes version compatible with this feature.
  • If applicable, the config template has been updated.

@pr-commenter
Copy link

pr-commenter bot commented Oct 8, 2023

Bloop Bleep... Dogbot Here

Regression Detector Results

Run ID: 573b1dcc-4c51-4a14-ab1c-3f79bcea7dc6
Baseline: d60b9cf
Comparison: 95663e2
Total datadog-agent CPUs: 7

Explanation

A regression test is an integrated performance test for datadog-agent in a repeatable rig, with varying configuration for datadog-agent. What follows is a statistical summary of a brief datadog-agent run for each configuration across SHAs given above. The goal of these tests are to determine quickly if datadog-agent performance is changed and to what degree by a pull request.

Because a target's optimization goal performance in each experiment will vary somewhat each time it is run, we can only estimate mean differences in optimization goal relative to the baseline target. We express these differences as a percentage change relative to the baseline target, denoted "Δ mean %". These estimates are made to a precision that balances accuracy and cost control. We represent this precision as a 90.00% confidence interval denoted "Δ mean % CI": there is a 90.00% chance that the true value of "Δ mean %" is in that interval.

We decide whether a change in performance is a "regression" -- a change worth investigating further -- if both of the following two criteria are true:

  1. The estimated |Δ mean %| ≥ 5.00%. This criterion intends to answer the question "Does the estimated change in mean optimization goal performance have a meaningful impact on your customers?". We assume that when |Δ mean %| < 5.00%, the impact on your customers is not meaningful. We also assume that a performance change in optimization goal is worth investigating whether it is an increase or decrease, so long as the magnitude of the change is sufficiently large.

  2. Zero is not in the 90.00% confidence interval "Δ mean % CI" about "Δ mean %". This statement is equivalent to saying that there is at least a 90.00% chance that the mean difference in optimization goal is not zero. This criterion intends to answer the question, "Is there a statistically significant difference in mean optimization goal performance?". It also means there is no more than a 10.00% chance this criterion reports a statistically significant difference when the true difference in mean optimization goal is zero -- a "false positive". We assume you are willing to accept a 10.00% chance of inaccurately detecting a change in performance when no true difference exists.

The table below, if present, lists those experiments that have experienced a statistically significant change in mean optimization goal performance between baseline and comparison SHAs with 90.00% confidence OR have been detected as newly erratic. Negative values of "Δ mean %" mean that baseline is faster, whereas positive values of "Δ mean %" mean that comparison is faster. Results that do not exhibit more than a ±5.00% change in their mean optimization goal are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting change is observed.

No interesting changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%.

Fine details of change detection per experiment.
experiment goal Δ mean % Δ mean % CI confidence
otel_to_otel_logs ingress throughput +0.51 [-1.09, +2.11] 39.86%
file_tree egress throughput +0.44 [-1.88, +2.75] 24.24%
tcp_syslog_to_blackhole ingress throughput +0.20 [+0.06, +0.33] 98.25%
dogstatsd_string_interner_8MiB_10k ingress throughput +0.02 [-0.05, +0.09] 42.17%
uds_dogstatsd_to_api ingress throughput +0.02 [-0.21, +0.25] 10.39%
trace_agent_json ingress throughput +0.02 [-0.12, +0.15] 16.00%
dogstatsd_string_interner_8MiB_50k ingress throughput +0.01 [-0.03, +0.05] 24.91%
dogstatsd_string_interner_128MiB_100 ingress throughput +0.00 [-0.14, +0.14] 0.77%
dogstatsd_string_interner_64MiB_100 ingress throughput +0.00 [-0.14, +0.14] 0.38%
dogstatsd_string_interner_64MiB_1k ingress throughput +0.00 [-0.13, +0.13] 0.22%
dogstatsd_string_interner_8MiB_100k ingress throughput +0.00 [-0.07, +0.07] 0.00%
dogstatsd_string_interner_8MiB_1k ingress throughput -0.00 [-0.10, +0.10] 0.44%
dogstatsd_string_interner_8MiB_100 ingress throughput -0.00 [-0.13, +0.13] 1.34%
idle egress throughput -0.01 [-2.97, +2.96] 0.26%
tcp_dd_logs_filter_exclude ingress throughput -0.01 [-0.09, +0.07] 16.66%
trace_agent_msgpack ingress throughput -0.03 [-0.16, +0.10] 32.07%
dogstatsd_string_interner_128MiB_1k ingress throughput -0.03 [-0.34, +0.27] 14.83%
file_to_blackhole egress throughput -0.22 [-0.67, +0.22] 59.39%

blt added a commit that referenced this pull request Oct 10, 2023
This commit extracts experiments from #19882 and #19990, expanding the
throughput parameter across the range of concern. Note that at the lower end of
the throughput range the full contexts will not be explored, meaning that
throughput per second is the dominate metric here. Contexts per second we will
need to derive from captured target telemetry.

Intended to back ongoing experiments with the stringInterner.

REF SMPTNG-12

Signed-off-by: Brian L. Troutwine <[email protected]>
@blt
Copy link
Contributor

blt commented Oct 10, 2023

I've introduced #20036 to give us a broader range of experimental throughput. This analysis will concern itself with run-id 0e3112e1-e30a-4211-9b28-915b909d5219.

Experiment CPU time Lifetime Allocations Heap Live Size Mutex Time
uds_dogstatsd_to_api_nodist_1KiB ~0 ~ -15k 0 Mb 0ms
uds_dogstatsd_to_api_nodist_100MiB -27sec -50M +30MB ~ -1sec
uds_dogstatsd_to_api_nodist_200MiB -28sec -60M -106Mb +4sec

Compare to the raising of the default size this PR sees a similar reduction in lifetime allocations and a curious rise in heap live size in only one experiment, otherwise the drop in _200MiB heap live size is improved. If we consider capture data throughput is improved below threshold in the _200MiB case while in the _100Mib it remains constant as before, implying that although this change does allow more throughput through the Agent's throughput in these experiments is stable <= 100MiB but becomes unstable somewhere around 120MiB/sec, 125MiB/sec if this PR is merged. CPU utilization is improved in a manner similar to the default size raise PR.

I would like to see this run repeated with #20036 in its history. I see the results here as positive, an improvement over main.

blt added a commit that referenced this pull request Oct 10, 2023
This commit extracts experiments from #19882 and #19990, expanding the
throughput parameter across the range of concern. Note that at the lower end of
the throughput range the full contexts will not be explored, meaning that
throughput per second is the dominate metric here. Contexts per second we will
need to derive from captured target telemetry.

Intended to back ongoing experiments with the stringInterner.

REF SMPTNG-12

Signed-off-by: Brian L. Troutwine <[email protected]>
@scottopell scottopell force-pushed the sopell/forked-string-interner branch from a9d5266 to 3185ebc Compare October 10, 2023 20:28
blt added a commit that referenced this pull request Oct 11, 2023
This commit extracts experiments from #19882 and #19990, expanding the
throughput parameter across the range of concern. Note that at the lower end of
the throughput range the full contexts will not be explored, meaning that
throughput per second is the dominate metric here. Contexts per second we will
need to derive from captured target telemetry.

Intended to back ongoing experiments with the stringInterner.

REF SMPTNG-12

Signed-off-by: Brian L. Troutwine <[email protected]>
@blt
Copy link
Contributor

blt commented Oct 11, 2023

This analysis will concern itself with run-id 96f38e65-f7ae-4073-8fab-dbb569b4e089.

Experiment CPU time Lifetime Allocations Heap Live Size Mutex Time
uds_dogstatsd_to_api_nodist_1KiB ~0 ~ 0 0 Mb - 8ms
uds_dogstatsd_to_api_nodist_1MiB ~0 ~ -100k -2 Mb ~0ms
uds_dogstatsd_to_api_nodist_16MiB -2 sec - 8M +18 Mb +100ms
uds_dogstatsd_to_api_nodist_32MiB -3 sec - 16M -11 Mb + 1 sec
uds_dogstatsd_to_api_nodist_100MiB -29 sec -49M -59Mb +1 sec
uds_dogstatsd_to_api_nodist_200MiB -30 sec -59M -146Mb +1 sec

A consistent result. The low end is not pessimized and the high end is improved. Looking at the 16MiB case capture data does bear out the higher memory consumption. Throughput is improved, profile suggests the bulk of the greater heap is in the UDS listener and then partially in the string interner. I don't have a hypothesis for that result.

Compare to the capture results for #19882 here overall memory consumption, throughput gains, CPU reduction appear to be on-par with raising the default size of the main version of the cache.

blt added a commit that referenced this pull request Oct 12, 2023
This commit updates the environment flags on our experiments to ensure that
profiling runs are fine grained through the whole of the run. Updated with
reference to the ongoing investigation in #19990

REF SMPTNG-12

Signed-off-by: Brian L. Troutwine <[email protected]>
blt added a commit that referenced this pull request Oct 12, 2023
* Ensure profiling is fine-grained in the Regression Detector

This commit updates the environment flags on our experiments to ensure that
profiling runs are fine grained through the whole of the run. Updated with
reference to the ongoing investigation in #19990

REF SMPTNG-12

Signed-off-by: Brian L. Troutwine <[email protected]>

* CPUDURATION -> CPU_DURATION

---------

Signed-off-by: Brian L. Troutwine <[email protected]>
Co-authored-by: George Hahn <[email protected]>
@scottopell scottopell force-pushed the sopell/forked-string-interner branch from 3185ebc to 0e4392d Compare October 12, 2023 20:56
@remeh
Copy link
Contributor

remeh commented Oct 16, 2023

@blt hijacking a bit the thread, sorry, but for a related question.

I was thinking about the Heap Live Size which appears to be slightly higher in one scenario in current PR (your comment right above) but also in in this comment in #19882 increasing the string interner size.

AFAIR, both implementations rely on the GC to use the equivalent of weak pointers/references, where the GC could possible collect them and that's OK. Wouldn't it mean that the moments the GC runs has an impact on the Heap Live Size result, that is, the moment the data is collected also has an impact? If this value is often sampled at a time where the GC hasn't intervened, you would possibly see differences between scenarios. Should we consider scenarios where the GC is manually kicked off before collecting samples?

@blt
Copy link
Contributor

blt commented Oct 16, 2023

@blt hijacking a bit the thread, sorry, but for a related question.

It's an interesting question and apropos I think.

I was thinking about the Heap Live Size which appears to be slightly higher in one scenario in current PR (your comment right above) but also in in this comment in #19882 increasing the string interner size.

AFAIR, both implementations rely on the GC to use the equivalent of weak pointers/references, where the GC could possible collect them and that's OK. Wouldn't it mean that the moments the GC runs has an impact on the Heap Live Size result, that is, the moment the data is collected also has an impact?

It would have an impact, yeah.

If this value is often sampled at a time where the GC hasn't intervened, you would possibly see differences between scenarios. Should we consider scenarios where the GC is manually kicked off before collecting samples?

I'd be vaguely against that. Which is to say, I think the profiler is not quite the right tool for this and I could have called this out a little better. Because of the sampling problem, like you point out, we might sample the Agent before GC has kicked on but that is memory use that the user of Agent would actually see, so should we elide it by manually triggering GC before sampling? I think that would give us a different view, but it may be a skewed view. The right tool here is to examine the capture data which is itself sampled but at 1hz.

I guess my argument would be that a freshly GC'ed Agent isn't any less representative than one where the GC has yet to run. Both states would be observable to an Agent user and need thought about when deciding resource allocation etc. I do continue to have a hunch that focusing on lifetime allocations is probably the more valuable thing coming out of the profiler, but as @scottopell has pointed out we do want to be careful to avoid pessimizing observable RSS consumption, or at least do so with the understanding of better gains elsewhere.

@blt
Copy link
Contributor

blt commented Oct 16, 2023

Which is an interesting alternative view. Check this out:

The CPU numbers with the new interner are consistently on-par or better especially as throughput increases but the memory consumption data is less rosy on the low-end. It might be that with the way sampling works the live heap size ought to be dropped out of consideration, and only referenced from capture data.

@blt
Copy link
Contributor

blt commented Oct 16, 2023

Another view:

I think -- see DataDog/lading#702 and #19993 -- we should have a more clear notion of the number of contexts rolling through Agent in these experiments since the implementation is sensitive to that number, but these are neat results.

@scottopell
Copy link
Contributor Author

IIRC the GC is run before the heap state is captured for the profiler.
Along the same lines, a concern of periodic tasks (ie, flush intervals, garbage collections) giving an un-representative view is what eventually led to #20081 . I believe this is more of a concern for cpu profiles compared to heap profiles due to the way heap profiling works in go (continuously accumulated) vs cpu profiling (run on-demand)

I'll rebase and kick off another run of this to include the changes from #20081

@scottopell scottopell force-pushed the sopell/forked-string-interner branch from 0e4392d to 98115f5 Compare October 17, 2023 21:02
blt added a commit that referenced this pull request Oct 18, 2023
This commit replaces the _nodist variants of the `uds_dogstatsd_to_api`
experiment with a set of parameterized string interner experiments, varying both
total contexts and throughput along two scales. The goal here is to understand
how, at 8MiB/sec throughput, the variation of contexts impacts the
stringInterner hotspot and along throughput to achieve coverage on a small range
of contexts.

REF SMPTNG-12
REF #19852
REF #19882
REF #19990

Signed-off-by: Brian L. Troutwine <[email protected]>
blt added a commit that referenced this pull request Oct 18, 2023
* Add experiments focused on dogstatsd's string interner

This commit replaces the _nodist variants of the `uds_dogstatsd_to_api`
experiment with a set of parameterized string interner experiments, varying both
total contexts and throughput along two scales. The goal here is to understand
how, at 8MiB/sec throughput, the variation of contexts impacts the
stringInterner hotspot and along throughput to achieve coverage on a small range
of contexts.

REF SMPTNG-12
REF #19852
REF #19882
REF #19990

Signed-off-by: Brian L. Troutwine <[email protected]>

* correct target metrics path for Agent

Signed-off-by: Brian L. Troutwine <[email protected]>

---------

Signed-off-by: Brian L. Troutwine <[email protected]>
tlmSIRSize = telemetry.NewSimpleGauge("dogstatsd", "string_interner_entries",
"Number of entries in the string interner")
tlmSIRBytes = telemetry.NewSimpleGauge("dogstatsd", "string_interner_bytes",
"Number of bytes stored in the string interner")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really really valuable imformation, can we try to keep it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, don't worry I'll make sure all existing telemetry is covered in a final version of this PR.

@scottopell scottopell force-pushed the sopell/forked-string-interner branch from 98115f5 to 04f36b5 Compare October 23, 2023 16:03
@scottopell
Copy link
Contributor Author

Exploring a slightly different architecture in #20335

@scottopell scottopell force-pushed the sopell/forked-string-interner branch from 04f36b5 to 95663e2 Compare October 23, 2023 17:25
@scottopell scottopell closed this Mar 11, 2024
@dd-devflow dd-devflow bot deleted the sopell/forked-string-interner branch April 24, 2024 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants