Skip to content

Commit

Permalink
[v17] RFD 184: automatic updates, server-side logic (#52275)
Browse files Browse the repository at this point in the history
* Implement immediate schedule support for automatic updates (#47920)

* Implement immediate schedule support

* expose edition, fips, and ensure ping endpoint answers

* fix after rebase

* fix cache tests

* introduce webclient.ReusableClient (#49296)

* Move autoupdate code in proxy to make more sense (#49484)

* Move autoupdate code in proxy to make more sense

* lint + godoc

* Start `autoupdate_agent_rollout` controller in auth service (#49101)

* run autoupdate_agent_rollout controller

* Recover from panics inside the controller

* Address tim's feedback

Co-authored-by: rosstimothy <[email protected]>

---------

Co-authored-by: rosstimothy <[email protected]>

* kube-agent-updater: add RFD-184 trigger and version getter (#49297)

* add proxy version getter and maintenance trigger

* add failover trigger and versionGetter

* lint

* Apply suggestions from code review

Co-authored-by: Marco Dinis <[email protected]>

* address marco's feedback

* licensing

---------

Co-authored-by: Marco Dinis <[email protected]>

* Rename lib/kubernetestoken to lib/kube/token (#49554)

* Rename lib/kubernetestoken to lib/kube/token

* Lint

* Make the proxy read from autoupdate_agent_rollout (#49380)

* Add autoupdate_agenbt_rollout support

* fix ping proxy tests

* address creack's feedback

* Address sclevine's feedback

Co-authored-by: Stephen Levine <[email protected]>

* fix panic in tests

---------

Co-authored-by: Stephen Levine <[email protected]>

* Fix flaky TestAutoUpdateAgentShouldUpdate (#49883)

* Fix flaky TestAutoUpdateAgentShouldUpdate

* Update lib/web/apiserver_ping_test.go

* Update lib/web/autoupdate_common_test.go

* autoupdate: reconcile rollout status and add strategy interface (#49735)

* autoupdate: reconcile rollout status and add strategy interface

* fix missing constants + add license

* lint

* fix proto field id

* Fix flaky TestAgentRolloutController (#49886)

* Fix falky TestAgentRolloutController

* switch to real clock + increase Eventually timeout

* Make reconciliation period a parameter + add TELEPORT_UNSTABLE env var

* Update lib/service/service_test.go

Co-authored-by: Alan Parra <[email protected]>

* Apply suggestions from code review

Co-authored-by: Alan Parra <[email protected]>

* Remove env var

* lint

---------

Co-authored-by: Alan Parra <[email protected]>

* Compute global rollout state (#49945)

* Compute global rollout state

* Simplify + missing wrong proto message description

* lint

* simplify

* for edoardo

* fix compute status test

* autoupdate: implement time-based strategy (#49736)

This commit implements the time-based rollout strategy describen in
RFD 184. The autoupdate_agent_rollout controller will make the groups
active based on their start days, start hour, and maintenance duration.
Once the maintenance window is over, the group becomes DONE.
In the DONE state, new agents will instalkl the target version but
existing agents will no longer be told to actively update.

* Use CMC as default config when set (#50039)

* autoupdate: Use CMC as default config when set

Part of: [RFD-184](#47126)

This commit implements backward compatibility when CMC is specified.
After this PR, if the user has no `autoupdate_config` resource but a
`cluster_maintenance_config` resource from RFD 109, we will use the CMC
to generate the config (update hour and update days) and craft the
`autoupdate_agent_rollout`.

* Update lib/autoupdate/rollout/client_test.go

Co-authored-by: Edoardo Spadolini <[email protected]>

* address feedback

* lint

---------

Co-authored-by: Edoardo Spadolini <[email protected]>

* Change autoupdate proto messages (#50234)

* Change autoupdate proto messages

This commits does 3 changes:
- reflect the maintenance duration on the rollout in a new spec field
- add a rollout start time field in its status
- change wait_days into wait_hours

* int64 -> in32 for consistency with other fields

* Add autoupdate_config and autoupdate_agent_rollout validation (#50181)

This commit removes the restrictions of the autoupdate_agent_rollout and autoupdate_config schedules but adds groups validation.

It also adds some optional server-side validation that should not be enforced at the resource level.

* autoupdate: implement halt-on-error strategy (#49737)

* autoupdate: implement halt-on-error strategy

* rewrite wait_days logic into wait_hours

* Apply suggestions from code review

Co-authored-by: Stephen Levine <[email protected]>

---------

Co-authored-by: Stephen Levine <[email protected]>

* add tctl create/get/edit support for autoupdate_agent_rollout (#50393)

* add tctl create/get/edit support for autoupdate_agent_rollout

* fix bad copy paste

* set rollout start date and don't start updating if rollout just changed (#50365)

This commit does two changes:
- the controller now sets the rollout start time when resetting the
  rollout
- the controller will not start a group if the rollout changed during
  the maintenance window (checks if the rollout start time is in the
  window)

* Reduce clock usage + add time and period override in rollout controller (#50634)

* Enable strategies in the autoupdate rollout controller (#50635)

* autoupdate rollout: honour the maintenance window duration (#50745)

* autoupdate rollout: honour the maintenance window duration

* Update lib/autoupdate/rollout/reconciler.go

Co-authored-by: Bartosz Leper <[email protected]>

* Address feedback

* Update lib/autoupdate/rollout/strategy.go

---------

Co-authored-by: Bartosz Leper <[email protected]>

* Fix proto resource 153 marshalling for autoupdate_* resources (#50688)

* Fix proto resource 153 marshalling

* Update tool/tctl/common/collection_test.go

Co-authored-by: Alan Parra <[email protected]>

* Update tool/tctl/common/collection_test.go

Co-authored-by: Alan Parra <[email protected]>

* Address feedback

- Change from Resource153AdapterV2 to ProtoResource153Adapter
- fix test failures and unmarshal proto resources properly
- add a failing round-trip proto 153 test case
- bonus: fix the table tesst reosurce create that did not support
  running a single row

* Apply suggestions from code review

Co-authored-by: Alan Parra <[email protected]>

* lint

---------

Co-authored-by: Alan Parra <[email protected]>

* Add autoupdate controller metrics (#50807)

* Add autoupdate controller metrics

* Do no panic in case of error conflict

* kube-agent-update: Use the RFD-184 webapi proxy update protocol by default when possible (#50464)

* kube-agent-update: Use the RFD-184 webapi proxy update protocol by default when possible

* Update integrations/kube-agent-updater/cmd/teleport-kube-agent-updater/main.go

Co-authored-by: Tiago Silva <[email protected]>

* log update group

---------

Co-authored-by: Tiago Silva <[email protected]>

* Add 'tctl autoupdate agents status' (#51079)

* Ensure proxy version getter adds the leading 'v' (#51687)

* Always create debug socket and expose health endpoints (#51616)

* Always create debug socket and expose health endpoints

* Consolidate the diagnostic multiplexers in a single function

* Fix tests

* Apply suggestions from code review

Co-authored-by: Edoardo Spadolini <[email protected]>

---------

Co-authored-by: Edoardo Spadolini <[email protected]>

* Fix autoupdate rollout controller metrics (#51803)

* kube-agent-updater pre-release builds trust the staging repo + insecure validator private repo fix (#51815)

* Fix insecure resolver in private repos + trust pre-release builds

* fixup! Fix insecure resolver in private repos + trust pre-release builds

* Use new autoupdate APIs in discovery service (#51758)

* Remove name parameter from proxy version getter

* Use autoupdate_agent_rollout as a source of version in scripts and integrations

* Fix tests

* Handle gracefully absence of a proxy in kube discovery sevrice

* Update lib/srv/discovery/kube_integration_watcher.go

Co-authored-by: Tiago Silva <[email protected]>

* Address marco's feedback

* Address marco's feedback pt.2

* Gracefully handle if we can't get autoupdate version

* fixup! Update lib/srv/discovery/kube_integration_watcher.go

---------

Co-authored-by: Tiago Silva <[email protected]>

* Autoupdate changelog entry in v17.3

* Fix tests after rebase, pt.1

* Update front preset fixtures since the preset role changed

* Add install script using teleport-update and oneoff.sh (#52155)

* Refactor node-join script to take safer options and reuse install option logic (#52196)

* Add install script using teleport-update and oneoff.sh

* Refactor node-join script to take safer options and reuse install option logic

* GoDoc + make functions private

* Address edoardo's feedback

* Allow prerelease Teleport to install official artifacts (#52444)

* Accept to install CE when running an AGPL build for backeard compat

* Bump e to fix build (oneoff args change)

* Make node install scripts install Teleport via teleport-update (#52226)

* Make the node install script use teleport-update

* Apply suggestions from code review

Co-authored-by: Edoardo Spadolini <[email protected]>

* Fix curl args + address bash exec comments

---------

Co-authored-by: Edoardo Spadolini <[email protected]>

* Use install.sh in discovery's default installer (#52368)

* Use install.sh in discovery's default installer

* fixup! Use install.sh in discovery's default installer

* Address marco's feedback

* Update lib/auth/grpcserver.go

Co-authored-by: Marco Dinis <[email protected]>

* Update lib/srv/server/installer/defaultinstallers.go

* apply edoard's feedback + write script to file

* Execute the downloaded shell script

* Add snapshot tests

* fixup! Add snapshot tests

---------

Co-authored-by: Marco Dinis <[email protected]>

* Fix error after rebase

* Fix test after rebase

---------

Co-authored-by: rosstimothy <[email protected]>
Co-authored-by: Marco Dinis <[email protected]>
Co-authored-by: Stephen Levine <[email protected]>
Co-authored-by: Alan Parra <[email protected]>
Co-authored-by: Edoardo Spadolini <[email protected]>
Co-authored-by: Bartosz Leper <[email protected]>
Co-authored-by: Tiago Silva <[email protected]>
  • Loading branch information
8 people authored Feb 27, 2025
1 parent f51e4ea commit adc7067
Show file tree
Hide file tree
Showing 106 changed files with 9,952 additions and 1,654 deletions.
27 changes: 27 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,32 @@
# Changelog

## 17.3.0

### Automatic Updates

17.3 introduces a new automatic update mechanism for system administrators to control which Teleport version their
agents are running. You can now configure the agent update schedule and desired agent version via the `autoupdate_config`
and `autoupdate_version` resources.

Updates are performed by the new `teleport-update` binary.
This new system is package manager-agnostic and opt-in. Existing agents won't be automatically enrolled, you can enroll
existing 17.3+ agents by running `teleport-update enable`.

`teleport-update` will become the new standard way of installing Teleport as it always picks the appropriate Teleport
edition (Community vs Enterprise), the cluster's desired version, and the correct Teleport variant (e.g. FIPS-compliant
cryptography).

You can find more information about the feature in [our documentation]().

### Package layout changes

Starting with 17.3.0, the Teleport DEB and RPM packages, notably used by the `apt`, `yum`, `dnf` and `zypper` package
managers, will place the Teleport binaries in `/opt/teleport` instead of `/usr/local/bin`.

The binaries will be symlinked to their previous location, no change should be required in your scripts or systemd units.

This change allows us to do automatic updates without conflicting with the package manager.

## 17.2.9 (02/25/25)

* Updated go-jose/v4 to v4.0.5 (addresses CVE-2025-27144). [#52467](https://github.com/gravitational/teleport/pull/52467)
Expand Down
118 changes: 113 additions & 5 deletions api/client/webclient/webclient.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,15 @@ import (
"github.com/gravitational/teleport/api/utils/keys"
)

const (
// AgentUpdateGroupParameter is the parameter used to specify the updater
// group when doing a Ping() or Find() query.
// The proxy server will modulate the auto_update part of the PingResponse
// based on the specified group. e.g. some groups might need to update
// before others.
AgentUpdateGroupParameter = "group"
)

// Config specifies information when building requests with the
// webclient.
type Config struct {
Expand All @@ -68,6 +77,9 @@ type Config struct {
Timeout time.Duration
// TraceProvider is used to retrieve a Tracer for creating spans
TraceProvider oteltrace.TracerProvider
// UpdateGroup is used to vary the webapi response based on the
// client's auto-update group.
UpdateGroup string
}

// CheckAndSetDefaults checks and sets defaults
Expand Down Expand Up @@ -166,12 +178,25 @@ func Find(cfg *Config) (*PingResponse, error) {
}
defer clt.CloseIdleConnections()

return findWithClient(cfg, clt)
}

func findWithClient(cfg *Config, clt *http.Client) (*PingResponse, error) {
ctx, span := cfg.TraceProvider.Tracer("webclient").Start(cfg.Context, "webclient/Find")
defer span.End()

endpoint := fmt.Sprintf("https://%s/webapi/find", cfg.ProxyAddr)
endpoint := &url.URL{
Scheme: "https",
Host: cfg.ProxyAddr,
Path: "/webapi/find",
}
if cfg.UpdateGroup != "" {
endpoint.RawQuery = url.Values{
AgentUpdateGroupParameter: []string{cfg.UpdateGroup},
}.Encode()
}

req, err := http.NewRequestWithContext(ctx, http.MethodGet, endpoint, nil)
req, err := http.NewRequestWithContext(ctx, http.MethodGet, endpoint.String(), nil)
if err != nil {
return nil, trace.Wrap(err)
}
Expand Down Expand Up @@ -202,15 +227,29 @@ func Ping(cfg *Config) (*PingResponse, error) {
}
defer clt.CloseIdleConnections()

return pingWithClient(cfg, clt)
}

func pingWithClient(cfg *Config, clt *http.Client) (*PingResponse, error) {
ctx, span := cfg.TraceProvider.Tracer("webclient").Start(cfg.Context, "webclient/Ping")
defer span.End()

endpoint := fmt.Sprintf("https://%s/webapi/ping", cfg.ProxyAddr)
endpoint := &url.URL{
Scheme: "https",
Host: cfg.ProxyAddr,
Path: "/webapi/ping",
}
if cfg.UpdateGroup != "" {
endpoint.RawQuery = url.Values{
AgentUpdateGroupParameter: []string{cfg.UpdateGroup},
}.Encode()
}

if cfg.ConnectorName != "" {
endpoint = fmt.Sprintf("%s/%s", endpoint, cfg.ConnectorName)
endpoint = endpoint.JoinPath(cfg.ConnectorName)
}

req, err := http.NewRequestWithContext(ctx, http.MethodGet, endpoint, nil)
req, err := http.NewRequestWithContext(ctx, http.MethodGet, endpoint.String(), nil)
if err != nil {
return nil, trace.Wrap(err)
}
Expand Down Expand Up @@ -246,13 +285,18 @@ func Ping(cfg *Config) (*PingResponse, error) {
return pr, nil
}

// GetMOTD retrieves the Message Of The Day from the web proxy.
func GetMOTD(cfg *Config) (*MotD, error) {
clt, err := newWebClient(cfg)
if err != nil {
return nil, trace.Wrap(err)
}
defer clt.CloseIdleConnections()

return getMOTDWithClient(cfg, clt)
}

func getMOTDWithClient(cfg *Config, clt *http.Client) (*MotD, error) {
ctx, span := cfg.TraceProvider.Tracer("webclient").Start(cfg.Context, "webclient/GetMOTD")
defer span.End()

Expand Down Expand Up @@ -281,6 +325,60 @@ func GetMOTD(cfg *Config) (*MotD, error) {
return motd, nil
}

// NewReusableClient creates a reusable webproxy client. If you need to do a single call,
// use the webclient.Ping or webclient.Find functions instead.
func NewReusableClient(cfg *Config) (*ReusableClient, error) {
// no need to check and set config defaults, this happens in newWebClient
client, err := newWebClient(cfg)
if err != nil {
return nil, trace.Wrap(err, "building new web client")
}

return &ReusableClient{
client: client,
config: cfg,
}, nil
}

// ReusableClient is a webproxy client that allows the caller to make multiple calls
// without having to buildi a new HTTP client each time.
// Before retiring the client, you must make sure no calls are still in-flight, then call
// ReusableClient.CloseIdleConnections().
type ReusableClient struct {
client *http.Client
config *Config
}

// Find fetches discovery data by connecting to the given web proxy address.
// It is designed to fetch proxy public addresses without any inefficiencies.
func (c *ReusableClient) Find() (*PingResponse, error) {
return findWithClient(c.config, c.client)
}

// Ping serves two purposes. The first is to validate the HTTP endpoint of a
// Teleport proxy. This leads to better user experience: users get connection
// errors before being asked for passwords. The second is to return the form
// of authentication that the server supports. This also leads to better user
// experience: users only get prompted for the type of authentication the server supports.
func (c *ReusableClient) Ping() (*PingResponse, error) {
return pingWithClient(c.config, c.client)
}

// GetMOTD retrieves the Message Of The Day from the web proxy.
func (c *ReusableClient) GetMOTD() (*MotD, error) {
return getMOTDWithClient(c.config, c.client)
}

// CloseIdleConnections closes any connections on its [Transport] which
// were previously connected from previous requests but are now
// sitting idle in a "keep-alive" state. It does not interrupt any
// connections currently in use.
//
// This must be run before retiring the ReusableClient.
func (c *ReusableClient) CloseIdleConnections() {
c.client.CloseIdleConnections()
}

// MotD holds data about the current message of the day.
type MotD struct {
Text string
Expand All @@ -305,6 +403,10 @@ type PingResponse struct {
// reserved: license_warnings ([]string)
// AutomaticUpgrades describes whether agents should automatically upgrade.
AutomaticUpgrades bool `json:"automatic_upgrades"`
// Edition represents the Teleport edition. Possible values are "oss", "ent", and "community".
Edition string `json:"edition"`
// FIPS represents if Teleport is using FIPS-compliant cryptography.
FIPS bool `json:"fips"`
}

// PingErrorResponse contains the error from /webapi/ping.
Expand Down Expand Up @@ -336,6 +438,12 @@ type AutoUpdateSettings struct {
ToolsVersion string `json:"tools_version"`
// ToolsAutoUpdate indicates if the requesting tools client should be updated.
ToolsAutoUpdate bool `json:"tools_auto_update"`
// AgentVersion defines the version of teleport that agents enrolled into autoupdates should run.
AgentVersion string `json:"agent_version"`
// AgentAutoUpdate indicates if the requesting agent should attempt to update now.
AgentAutoUpdate bool `json:"agent_auto_update"`
// AgentUpdateJitterSeconds defines the jitter time an agent should wait before updating.
AgentUpdateJitterSeconds int `json:"agent_update_jitter_seconds"`
}

// KubeProxySettings is kubernetes proxy settings
Expand Down
Loading

0 comments on commit adc7067

Please sign in to comment.