New update strategy #204

thesharp · 2020-01-21T10:02:59Z

Feature Request

Desired Feature

New update strategy which would download updates, but leave the actual reboot process to the user.

Example Usage

This update strategy would be useful on a single-node clusters to avoid unscheduled downtimes.

Other Information

User would need a way to know that update is ready and waiting on reboot to be applied. Metrics seem to be the appropriate way for such notifications.

jlebon · 2020-01-21T15:00:17Z

There's probably two separate but related modes:

wait on reboot: anything that reboots the system (including the user) will update the machine
wait on permission AND reboot: sysadmin has to "release" the update -- without this, even rebooting won't update the system

The first can be done by just not using the finalization API (coreos/rpm-ostree#1814). The second can be done by using the finalization API, but stopping short of calling rpm-ostree finalize-deployment (which actually is a hidden command right now).

I can see the usefulness of this, and we could do it. Though anything that's not automatic is kinda counter to the mission :) I think just getting the periodic update strategy in and the fleet lock stuff fleshed out would be more beneficial.

lucab · 2020-01-22T14:41:42Z

Thanks all for the feedback, I'll cumulative reply here.

The "wait on reboot" method is what Container Linux does, and it's full of corner cases resulting in unplanned/accidental upgrades. IMHO that flow should be only used in manual operating mode, i.e. by stopping/disabling Zincati first. I am not planning to have an update-strategy working that way, as there are too many inherent corner-cases/races.

Regarding the "wait on permission" case, we have a bit more space for design. Current plan is to check for permission with one of these strategies:

always permitted (immediate)
after a successful HTTP request (fleet_lock)
within maintenance windows defined on a weekly basis (periodic) - once strategy: add weekly maintenance windows mode, configuration and logic #34 is done

@thesharp would one of the last two suit you, perhaps with some homemade helper (e.g. a local HTTP container with custom logic to decide when finalization is allowed)?

If not, I'd still try to come up with a flow which does not completely bypass Zincati finalization (for example, giving permission to reboot only if a specific filepath exists) and which does not require SSHing to each node.
Assuming you end up with a fleet of >100 nodes to upgrade your way, how were you planning to automate it? Something like: getting an alert based the metrics, and then scheduling an SSH task on each affected node to reboot?

thesharp · 2020-01-22T15:42:54Z

@lucab I think periodic will suit my needs. I would be able to schedule update window at suitable time/day.

But it also would be nice to have some way of knowing that update/reboot was done w/o monitoring server's uptime. I'm thinking maybe some sort of webhook triggering before the reboot?

Assuming you end up with a fleet of >100 nodes to upgrade your way

That case would be only suitable in something like a home lab. It isn't suitable for large installations.

lucab · 2020-01-22T15:56:41Z

@thesharp ack, then I'll close this one and try to get #34 done sometime soon.

But it also would be nice to have some way of knowing that update/reboot was done w/o monitoring server's uptime. I'm thinking maybe some sort of webhook triggering before the reboot?

This is exposed as a metric with an info label. It is a bit better than a webhook, as it is always correct in spite of failed upgrades or rollbacks. The result is the graph you see in the README.
For further details, I covered all of this recently at a Prometheus meetup: https://www.youtube.com/watch?v=TJAFktlhQi4 + https://speakerdeck.com/lucab/prometheus-metrics-from-host-local-services.

thesharp · 2020-01-23T14:03:36Z

@lucab you probably should link local_exporter somewhere in https://github.com/coreos/zincati/blob/master/docs/usage/metrics.md

lucab · 2020-03-10T09:31:06Z

I stabilized metrics and tweaked the docs in #243, which will be part of the next release (0.0.9).

The periodic is still work in progress, but I already have #34 to cover that. Closing this ticket.

lucab · 2020-03-10T09:40:25Z

I also split the file-based strategy idea to #245. Not going to pursue it at this time, periodic is still higher in my list.

lucab added area/updates kind/new-feature needs/more-information labels Jan 22, 2020

lucab closed this as completed Mar 10, 2020

lucab mentioned this issue Mar 10, 2020

updates: new strategy based on local filesystem #245

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New update strategy #204

New update strategy #204

thesharp commented Jan 21, 2020

jlebon commented Jan 21, 2020

lucab commented Jan 22, 2020 •

edited

Loading

thesharp commented Jan 22, 2020

lucab commented Jan 22, 2020

thesharp commented Jan 23, 2020

lucab commented Mar 10, 2020

lucab commented Mar 10, 2020

New update strategy #204

New update strategy #204

Comments

thesharp commented Jan 21, 2020

Feature Request

Desired Feature

Example Usage

Other Information

jlebon commented Jan 21, 2020

lucab commented Jan 22, 2020 • edited Loading

thesharp commented Jan 22, 2020

lucab commented Jan 22, 2020

thesharp commented Jan 23, 2020

lucab commented Mar 10, 2020

lucab commented Mar 10, 2020

lucab commented Jan 22, 2020 •

edited

Loading