Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of docs: clarify reschedule, migrate, and replacement terminology into release/1.7.x #25143

Conversation

hc-github-team-nomad-core
Copy link
Contributor

Backport

This PR is auto-generated from #24929 to be assessed for backporting due to the inclusion of the label backport/1.7.x.

🚨

Warning automatic cherry-pick of commits failed. If the first commit failed,
you will see a blank no-op commit below. If at least one commit succeeded, you
will see the cherry-picked commits up to, not including, the commit where
the merge conflict occurred.

The person who merged in the original PR is:
@tgross
This person should manually cherry-pick the original PR into a new backport PR,
and close this one when the manual backport PR is merged in.

merge conflict error: POST https://api.github.com/repos/hashicorp/nomad/merges: 409 Merge conflict []

The below text is copied from the body of the original PR.


Our vocabulary around scheduler behaviors outside of the reschedule and migrate blocks leaves room for confusion around whether the reschedule tracker should be propagated between allocations. There are effectively five different behaviors we need to cover:

  • restart: when the tasks of an allocation fail and we try to restart the tasks in place.

  • reschedule: when the restart block runs out of attempts (or the allocation fails before tasks even start), and we need to move the allocation to another node to try again.

  • migrate: when the user has asked to drain a node and we need to move the allocations. These are not failures, so we don't want to propagate the reschedule tracker.

  • replacement: when a node is lost, we don't count that against the reschedule tracker for the allocations on the node (it's not the allocation's "fault", after all). We don't want to run the migrate machinery here here either, as we can't contact the down node. To the scheduler, this is effectively the same as if we bumped the group.count

  • replacement for disconnect.replace = true: this is a replacement, but the replacement is intended to be temporary, so we propagate the reschedule tracker.

Add a section to the reschedule, migrate, and disconnect blocks explaining when each item applies. Update the use of the word "reschedule" in several places where "replacement" is correct, and vice-versa.

Fixes: #24918


major preview links:


Overview of commits

Copy link

CLA assistant check

Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement

Learn more about why HashiCorp requires a CLA and what the CLA includes


temp seems not to be a GitHub user.
You need a GitHub account to be able to sign the CLA.
If you have already a GitHub account, please add the email address used for this commit to your account.

Have you signed the CLA already but the status is still pending? Recheck it.

1 similar comment
Copy link

CLA assistant check

Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement

Learn more about why HashiCorp requires a CLA and what the CLA includes


temp seems not to be a GitHub user.
You need a GitHub account to be able to sign the CLA.
If you have already a GitHub account, please add the email address used for this commit to your account.

Have you signed the CLA already but the status is still pending? Recheck it.

@tgross
Copy link
Member

tgross commented Feb 18, 2025

Looking at the cherry-pick for this, the 1.7.x docs are way out of date because of the disconnect block, and these docs are soon to be deprecated anyways. Just going to skip this 1.7.x backport.

@tgross tgross closed this Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants