Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate autoupdate_config and autoupdate_agent_rollout #50181

Merged
merged 1 commit into from
Dec 17, 2024

Conversation

hugoShaka
Copy link
Contributor

Part of: RFD-184

Goal (internal): https://github.com/gravitational/cloud/issues/10289

This PR removes the restrictions of the autoupdate_agent_rollout and autoupdate_config schedules but adds groups validation.

It also adds some optional server-side validation that should not be enforced at the resource level.

@hugoShaka hugoShaka changed the title Hugo/autoupdate config validation Validate autoupdate_config and autoupdate_agent_rollout Dec 12, 2024
@hugoShaka hugoShaka marked this pull request as ready for review December 13, 2024 16:07
}

var maxGroups int
isCloud := modules.GetModules().Features().Cloud
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to relax these restrictions for some Cloud customers on a case-by-case basis. Would it be tricky to add an "unrestricted" feature? (Could be a separate PR)

Copy link
Contributor Author

@hugoShaka hugoShaka Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will likely want to rely on entitlements. I'm not familiar with this mechanism and will have to ask cloud devs how to do this. If it's not easy, I'll just glue an ugly environment variable so we are not blocked.

@hugoShaka hugoShaka force-pushed the hugo/autoupdate_config-validation branch from f7451c1 to 3a91f1b Compare December 13, 2024 21:27
@hugoShaka hugoShaka force-pushed the hugo/autoupdate_config-validation branch from 3a91f1b to 2966633 Compare December 16, 2024 21:05
@hugoShaka
Copy link
Contributor Author

@sclevine and @vapopov I revamped the PR as we changed from wait_days to wait_hours. Could yo do a fresh review?

Copy link
Contributor

@vapopov vapopov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, seems like you need to adjust unit tests (nil struct in map)

@hugoShaka hugoShaka force-pushed the hugo/autoupdate_config-validation branch 3 times, most recently from 50d6973 to b41f57e Compare December 17, 2024 16:01
@hugoShaka hugoShaka added the no-changelog Indicates that a PR does not require a changelog entry label Dec 17, 2024
This commit removes the restrictions of the autoupdate_agent_rollout and autoupdate_config schedules but adds groups validation.

It also adds some optional server-side validation that should not be enforced at the resource level.
@hugoShaka hugoShaka force-pushed the hugo/autoupdate_config-validation branch from b41f57e to 0fb7fd6 Compare December 17, 2024 16:12
Copy link
Collaborator

@r0mant r0mant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot.

@hugoShaka hugoShaka enabled auto-merge December 17, 2024 16:55
@hugoShaka hugoShaka added this pull request to the merge queue Dec 17, 2024
Merged via the queue into master with commit 89a3d2a Dec 17, 2024
42 checks passed
@hugoShaka hugoShaka deleted the hugo/autoupdate_config-validation branch December 17, 2024 17:31
hugoShaka added a commit that referenced this pull request Feb 18, 2025
This commit removes the restrictions of the autoupdate_agent_rollout and autoupdate_config schedules but adds groups validation.

It also adds some optional server-side validation that should not be enforced at the resource level.
hugoShaka added a commit that referenced this pull request Feb 18, 2025
This commit removes the restrictions of the autoupdate_agent_rollout and autoupdate_config schedules but adds groups validation.

It also adds some optional server-side validation that should not be enforced at the resource level.
carloscastrojumo pushed a commit to carloscastrojumo/teleport that referenced this pull request Feb 19, 2025
…ational#50181)

This commit removes the restrictions of the autoupdate_agent_rollout and autoupdate_config schedules but adds groups validation.

It also adds some optional server-side validation that should not be enforced at the resource level.
hugoShaka added a commit that referenced this pull request Feb 26, 2025
This commit removes the restrictions of the autoupdate_agent_rollout and autoupdate_config schedules but adds groups validation.

It also adds some optional server-side validation that should not be enforced at the resource level.
github-merge-queue bot pushed a commit that referenced this pull request Feb 27, 2025
* Implement immediate schedule support for automatic updates (#47920)

* Implement immediate schedule support

* expose edition, fips, and ensure ping endpoint answers

* fix after rebase

* fix cache tests

* introduce webclient.ReusableClient (#49296)

* Move autoupdate code in proxy to make more sense (#49484)

* Move autoupdate code in proxy to make more sense

* lint + godoc

* Start `autoupdate_agent_rollout` controller in auth service (#49101)

* run autoupdate_agent_rollout controller

* Recover from panics inside the controller

* Address tim's feedback

Co-authored-by: rosstimothy <[email protected]>

---------

Co-authored-by: rosstimothy <[email protected]>

* kube-agent-updater: add RFD-184 trigger and version getter (#49297)

* add proxy version getter and maintenance trigger

* add failover trigger and versionGetter

* lint

* Apply suggestions from code review

Co-authored-by: Marco Dinis <[email protected]>

* address marco's feedback

* licensing

---------

Co-authored-by: Marco Dinis <[email protected]>

* Rename lib/kubernetestoken to lib/kube/token (#49554)

* Rename lib/kubernetestoken to lib/kube/token

* Lint

* Make the proxy read from autoupdate_agent_rollout (#49380)

* Add autoupdate_agenbt_rollout support

* fix ping proxy tests

* address creack's feedback

* Address sclevine's feedback

Co-authored-by: Stephen Levine <[email protected]>

* fix panic in tests

---------

Co-authored-by: Stephen Levine <[email protected]>

* Fix flaky TestAutoUpdateAgentShouldUpdate (#49883)

* Fix flaky TestAutoUpdateAgentShouldUpdate

* Update lib/web/apiserver_ping_test.go

* Update lib/web/autoupdate_common_test.go

* autoupdate: reconcile rollout status and add strategy interface (#49735)

* autoupdate: reconcile rollout status and add strategy interface

* fix missing constants + add license

* lint

* fix proto field id

* Fix flaky TestAgentRolloutController (#49886)

* Fix falky TestAgentRolloutController

* switch to real clock + increase Eventually timeout

* Make reconciliation period a parameter + add TELEPORT_UNSTABLE env var

* Update lib/service/service_test.go

Co-authored-by: Alan Parra <[email protected]>

* Apply suggestions from code review

Co-authored-by: Alan Parra <[email protected]>

* Remove env var

* lint

---------

Co-authored-by: Alan Parra <[email protected]>

* Compute global rollout state (#49945)

* Compute global rollout state

* Simplify + missing wrong proto message description

* lint

* simplify

* for edoardo

* fix compute status test

* autoupdate: implement time-based strategy (#49736)

This commit implements the time-based rollout strategy describen in
RFD 184. The autoupdate_agent_rollout controller will make the groups
active based on their start days, start hour, and maintenance duration.
Once the maintenance window is over, the group becomes DONE.
In the DONE state, new agents will instalkl the target version but
existing agents will no longer be told to actively update.

* Use CMC as default config when set (#50039)

* autoupdate: Use CMC as default config when set

Part of: [RFD-184](#47126)

This commit implements backward compatibility when CMC is specified.
After this PR, if the user has no `autoupdate_config` resource but a
`cluster_maintenance_config` resource from RFD 109, we will use the CMC
to generate the config (update hour and update days) and craft the
`autoupdate_agent_rollout`.

* Update lib/autoupdate/rollout/client_test.go

Co-authored-by: Edoardo Spadolini <[email protected]>

* address feedback

* lint

---------

Co-authored-by: Edoardo Spadolini <[email protected]>

* Change autoupdate proto messages (#50234)

* Change autoupdate proto messages

This commits does 3 changes:
- reflect the maintenance duration on the rollout in a new spec field
- add a rollout start time field in its status
- change wait_days into wait_hours

* int64 -> in32 for consistency with other fields

* Add autoupdate_config and autoupdate_agent_rollout validation (#50181)

This commit removes the restrictions of the autoupdate_agent_rollout and autoupdate_config schedules but adds groups validation.

It also adds some optional server-side validation that should not be enforced at the resource level.

* autoupdate: implement halt-on-error strategy (#49737)

* autoupdate: implement halt-on-error strategy

* rewrite wait_days logic into wait_hours

* Apply suggestions from code review

Co-authored-by: Stephen Levine <[email protected]>

---------

Co-authored-by: Stephen Levine <[email protected]>

* add tctl create/get/edit support for autoupdate_agent_rollout (#50393)

* add tctl create/get/edit support for autoupdate_agent_rollout

* fix bad copy paste

* set rollout start date and don't start updating if rollout just changed (#50365)

This commit does two changes:
- the controller now sets the rollout start time when resetting the
  rollout
- the controller will not start a group if the rollout changed during
  the maintenance window (checks if the rollout start time is in the
  window)

* Reduce clock usage + add time and period override in rollout controller (#50634)

* Enable strategies in the autoupdate rollout controller (#50635)

* autoupdate rollout: honour the maintenance window duration (#50745)

* autoupdate rollout: honour the maintenance window duration

* Update lib/autoupdate/rollout/reconciler.go

Co-authored-by: Bartosz Leper <[email protected]>

* Address feedback

* Update lib/autoupdate/rollout/strategy.go

---------

Co-authored-by: Bartosz Leper <[email protected]>

* Fix proto resource 153 marshalling for autoupdate_* resources (#50688)

* Fix proto resource 153 marshalling

* Update tool/tctl/common/collection_test.go

Co-authored-by: Alan Parra <[email protected]>

* Update tool/tctl/common/collection_test.go

Co-authored-by: Alan Parra <[email protected]>

* Address feedback

- Change from Resource153AdapterV2 to ProtoResource153Adapter
- fix test failures and unmarshal proto resources properly
- add a failing round-trip proto 153 test case
- bonus: fix the table tesst reosurce create that did not support
  running a single row

* Apply suggestions from code review

Co-authored-by: Alan Parra <[email protected]>

* lint

---------

Co-authored-by: Alan Parra <[email protected]>

* Add autoupdate controller metrics (#50807)

* Add autoupdate controller metrics

* Do no panic in case of error conflict

* kube-agent-update: Use the RFD-184 webapi proxy update protocol by default when possible (#50464)

* kube-agent-update: Use the RFD-184 webapi proxy update protocol by default when possible

* Update integrations/kube-agent-updater/cmd/teleport-kube-agent-updater/main.go

Co-authored-by: Tiago Silva <[email protected]>

* log update group

---------

Co-authored-by: Tiago Silva <[email protected]>

* Add 'tctl autoupdate agents status' (#51079)

* Ensure proxy version getter adds the leading 'v' (#51687)

* Always create debug socket and expose health endpoints (#51616)

* Always create debug socket and expose health endpoints

* Consolidate the diagnostic multiplexers in a single function

* Fix tests

* Apply suggestions from code review

Co-authored-by: Edoardo Spadolini <[email protected]>

---------

Co-authored-by: Edoardo Spadolini <[email protected]>

* Fix autoupdate rollout controller metrics (#51803)

* kube-agent-updater pre-release builds trust the staging repo + insecure validator private repo fix (#51815)

* Fix insecure resolver in private repos + trust pre-release builds

* fixup! Fix insecure resolver in private repos + trust pre-release builds

* Use new autoupdate APIs in discovery service (#51758)

* Remove name parameter from proxy version getter

* Use autoupdate_agent_rollout as a source of version in scripts and integrations

* Fix tests

* Handle gracefully absence of a proxy in kube discovery sevrice

* Update lib/srv/discovery/kube_integration_watcher.go

Co-authored-by: Tiago Silva <[email protected]>

* Address marco's feedback

* Address marco's feedback pt.2

* Gracefully handle if we can't get autoupdate version

* fixup! Update lib/srv/discovery/kube_integration_watcher.go

---------

Co-authored-by: Tiago Silva <[email protected]>

* Autoupdate changelog entry in v17.3

* Fix tests after rebase, pt.1

* Update front preset fixtures since the preset role changed

* Add install script using teleport-update and oneoff.sh (#52155)

* Refactor node-join script to take safer options and reuse install option logic (#52196)

* Add install script using teleport-update and oneoff.sh

* Refactor node-join script to take safer options and reuse install option logic

* GoDoc + make functions private

* Address edoardo's feedback

* Allow prerelease Teleport to install official artifacts (#52444)

* Accept to install CE when running an AGPL build for backeard compat

* Bump e to fix build (oneoff args change)

* Make node install scripts install Teleport via teleport-update (#52226)

* Make the node install script use teleport-update

* Apply suggestions from code review

Co-authored-by: Edoardo Spadolini <[email protected]>

* Fix curl args + address bash exec comments

---------

Co-authored-by: Edoardo Spadolini <[email protected]>

* Use install.sh in discovery's default installer (#52368)

* Use install.sh in discovery's default installer

* fixup! Use install.sh in discovery's default installer

* Address marco's feedback

* Update lib/auth/grpcserver.go

Co-authored-by: Marco Dinis <[email protected]>

* Update lib/srv/server/installer/defaultinstallers.go

* apply edoard's feedback + write script to file

* Execute the downloaded shell script

* Add snapshot tests

* fixup! Add snapshot tests

---------

Co-authored-by: Marco Dinis <[email protected]>

* Fix error after rebase

* Fix test after rebase

---------

Co-authored-by: rosstimothy <[email protected]>
Co-authored-by: Marco Dinis <[email protected]>
Co-authored-by: Stephen Levine <[email protected]>
Co-authored-by: Alan Parra <[email protected]>
Co-authored-by: Edoardo Spadolini <[email protected]>
Co-authored-by: Bartosz Leper <[email protected]>
Co-authored-by: Tiago Silva <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-changelog Indicates that a PR does not require a changelog entry size/lg
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants