Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only retrieve updated query statement metrics #19321

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

sethsamuel
Copy link
Contributor

@sethsamuel sethsamuel commented Dec 27, 2024

What does this PR do?

Updates MySQL statement collection to only gather statements that have executed since the last collection. This functionality is behind a flag to allow for slow rollout and testing before hopefully making it standard.

Four variant builds were compared:

  • latest, a control with current live behavior
  • digest-only, query for the same records as latest but only get the digest, then perform a second query for the digest text if it is not already cached
  • last-seen, query only for digests of statements that have executed since the last collection, then query for digest text if not cached
  • last-seen-with-text, query for statements executed since last collection but with digest text.

Testing was performed on orders app with 1 minute bursts of high cardinality queries every 5 minutes.
Screenshot 2024-12-27 at 11 38 46 AM

Somewhat surprisingly, querying for the digest and then for digest text made little difference in performance, even with large random statements (~1kb) executed during bursts. When querying for only updated rows, the much smaller row count allows for fetching the text as well, avoiding a second database query.

CPU usage is greatly reduced during normal load, and somewhat reduced during high cardinality bursts.
Screenshot 2024-12-27 at 11 39 33 AM

Query time and overall statement metrics collection time (note that y-axis is in seconds, not ms) are greatly reduced.
Screenshot 2024-12-27 at 11 40 07 AM
Screenshot 2024-12-27 at 11 39 48 AM

Motivation

Customers have complained about CPU usage by the MySQL check. This change uses a similar technique to the other database integrations to minimize the amount of unnecessary data retrieved for each statement collection.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

Copy link

codecov bot commented Dec 27, 2024

Codecov Report

Attention: Patch coverage is 95.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 87.65%. Comparing base (aa9bd4e) to head (bdacf12).
Report is 12 commits behind head on master.

Additional details and impacted files
Flag Coverage Δ
activemq ?
cassandra ?
hive ?
hivemq ?
ignite ?
jboss_wildfly ?
kafka ?
mysql 89.55% <95.00%> (+0.02%) ⬆️
presto ?
solr ?

Flags with carried forward coverage won't be shown. Click here to find out more.

@sethsamuel sethsamuel changed the title Cache digest text for MySQL Only retrieve updated query statement metrics Dec 27, 2024
@sethsamuel sethsamuel added the qa/skip-qa Automatically skip this PR for the next QA label Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant