Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize vulnerability host counts #24914

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

mostlikelee
Copy link
Contributor

@mostlikelee mostlikelee commented Dec 19, 2024

#22364

Batching selects aggregating host counts based on CVE and adding concurrency.

  • Changes file added for user-visible changes in changes/, orbit/changes/ or ee/fleetd-chrome/changes.
    See Changes files for more information.
  • Input data is properly validated, SELECT * is avoided, SQL injection is prevented (using placeholders for values in statements)
  • Added/updated tests
  • Manual QA for all new/changed functionality

@mostlikelee mostlikelee requested a review from a team as a code owner December 19, 2024 18:10
Copy link

codecov bot commented Dec 19, 2024

Codecov Report

Attention: Patch coverage is 81.16883% with 29 lines in your changes missing coverage. Please review.

Project coverage is 63.80%. Comparing base (e78bf6e) to head (0d43ea6).
Report is 102 commits behind head on main.

Files with missing lines Patch % Lines
server/datastore/mysql/vulnerabilities.go 82.06% 19 Missing and 7 partials ⚠️
cmd/fleet/cron.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #24914      +/-   ##
==========================================
+ Coverage   63.56%   63.80%   +0.23%     
==========================================
  Files        1602     1608       +6     
  Lines      151820   153090    +1270     
  Branches     3952     3952              
==========================================
+ Hits        96511    97673    +1162     
- Misses      47624    47625       +1     
- Partials     7685     7792     +107     
Flag Coverage Δ
backend 64.61% <81.16%> (+0.24%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@getvictor
Copy link
Member

In my opinion, we should not use concurrent batches for this DB access because it will put an extra load on the DB reader.

We know that the vulnerability cron uses a lot of CPU/DB resources, so the goal should be to smooth out the performance spikes. One way to do that is to pause for some time (500ms?) between each batch.

@iansltx
Copy link
Member

iansltx commented Dec 19, 2024

Counterpoint to the above: smaller batches should help DB load massively, and we can expose an override for concurrency as an env var (with a default that we've confirmed as working in load test) so customers can tune how hard the tool hits the DB. My current guess is that this will be significantly lighter on the DB in load test, even concurrently, than the old massive temp table method (and I think we'll get useful info from loadtest here), and with the env var in place we can tune things easily enough once this hits production workloads.

@iansltx
Copy link
Member

iansltx commented Dec 19, 2024

Also, given that we're talking about 5 concurrent sets of queries, if someone is going over prepared statement maximums based on these changes:

  1. They were running too hot to begin with
  2. They can adjust concurrency down via the proposed env var

@iansltx
Copy link
Member

iansltx commented Dec 20, 2024

Started a Slack thread about which data sets to use to test this.

@mostlikelee mostlikelee marked this pull request as draft December 23, 2024 18:11
@mostlikelee mostlikelee marked this pull request as ready for review December 23, 2024 20:53
Copy link
Member

@iansltx iansltx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only feedback here is on the routines naming. I also see where the host count is bugged but I'll fix that in a PR stacked on top of this one.

@@ -1257,6 +1258,11 @@ func (man Manager) addConfigs() {
false,
"Don't sync installed Windows updates nor perform Windows OS vulnerability processing.",
)
man.addConfigInt(
"vulnerabilities.max_routines",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"vulnerabilities.max_routines",
"vulnerabilities.max_concurrency",

Seems like "concurrency" is more self-evident here. Guessing you cycled through that as a naming idea here, so it'd be useful to understand why this naming convention won.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants