Skip to content

backup: fix AdminUnsplit race in online restore download cleanup#169758

Open
msbutler wants to merge 2 commits intocockroachdb:masterfrom
msbutler:butler-unsplit-err
Open

backup: fix AdminUnsplit race in online restore download cleanup#169758
msbutler wants to merge 2 commits intocockroachdb:masterfrom
msbutler:butler-unsplit-err

Conversation

@msbutler
Copy link
Copy Markdown
Collaborator

@msbutler msbutler commented May 5, 2026

kvpb: add ErrKeyNotStartOfRange sentinel for AdminUnsplit
Introduce a typed sentinel error for the "key is not the start of a
range" condition returned by AdminUnsplit. This replaces brittle
strings.Contains checks with errors.Is, while including a string
fallback in the IsKeyNotStartOfRange helper for mixed-version
compatibility.

Migrate the GC job's unsplitRangesInSpan to use the new helper.

Epic: none
Fixes: #169637

Co-Authored-By: roachdev-claude [email protected]


backup: make unstickRestoreSpans best-effort
A race between the GC job and the online restore download job could
cause AdminUnsplit to fail with "key is not the start of a range" when
the GC job merges ranges that the download job later tries to unsplit.
This caused retry spiraling and eventual job failure.

Make unstickRestoreSpans swallow "not start of range" errors (the
range already merged, so the sticky bit is gone) and use
besteffort.Warning for other errors, since unsplitting is cleanup
that should not block the restore.

Epic: none
Fixes: #169637

Introduce a typed sentinel error for the "key is not the start of a
range" condition returned by AdminUnsplit. This replaces brittle
strings.Contains checks with errors.Is, while including a string
fallback in the IsKeyNotStartOfRange helper for mixed-version
compatibility.

Migrate the GC job's unsplitRangesInSpan to use the new helper.

Epic: none
Fixes: cockroachdb#169637

Co-Authored-By: roachdev-claude <[email protected]>
@msbutler msbutler self-assigned this May 5, 2026
@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io Bot commented May 5, 2026

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented May 5, 2026

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

A race between the GC job and the online restore download job could
cause AdminUnsplit to fail with "key is not the start of a range" when
the GC job merges ranges that the download job later tries to unsplit.
This caused retry spiraling and eventual job failure.

Make unstickRestoreSpans swallow "not start of range" errors (the
range already merged, so the sticky bit is gone) and use
besteffort.Warning for other errors, since unsplitting is cleanup
that should not block the restore.

Epic: none
Fixes: cockroachdb#169637

Co-Authored-By: roachdev-claude <[email protected]>
@msbutler msbutler force-pushed the butler-unsplit-err branch from 78d3dc8 to 34cec46 Compare May 5, 2026 19:06
@msbutler msbutler marked this pull request as ready for review May 5, 2026 19:52
@msbutler msbutler requested review from a team as code owners May 5, 2026 19:52
@msbutler msbutler requested review from dt and removed request for a team May 5, 2026 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

release-26.2: roachtest: backup-restore/online-restore failed

2 participants