restore: migrate readers + gate legacy descriptor writes#171341
Closed
kev-cao wants to merge 5 commits into
Closed
restore: migrate readers + gate legacy descriptor writes#171341kev-cao wants to merge 5 commits into
kev-cao wants to merge 5 commits into
Conversation
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Contributor
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
Member
Add the building blocks to persist restored descriptor `(ID, Version)`
tuples in dedicated `system.job_info` rows instead of inlining the full
descriptor payloads on `RestoreDetails`. The latter exceeds the 64 MiB
raft command limit on the `system.job_info` row write at ~1M
descriptors, blocking RESTORE during job resumption.
This commit is foundation-only — no behavior change. It adds:
- `RestoreDescRef` and `RestoreDescRefs` protos in `backuppb`.
- Five per-type info-key constants (`restoreTableDescRefsKey`, etc.).
- Read/write helpers (`getDescRefs`, `writeDescRefs`,
`tableDescRefs`, `typeDescRefs`, `schemaDescRefs`,
`databaseDescRefs`, `functionDescRefs`, `allDescRefs`) that prefer
the info-key row and fall back to the legacy
`details.{Type}Descs` slice for jobs created before the info-key
writes existed.
- The `V26_3_DescriptorIDsInRestoreDetails` cluster version that
later commits will use to gate the legacy descriptor-slice writes
off once the cluster has finalized the upgrade.
Subsequent commits dual-write the info-key rows, hoist a single
descriptor fetch in `doResume`, migrate readers off the legacy
slices, and gate the legacy write on the new CV.
Informs: #170669
Release note: None
Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
createImportingDescriptors and publishDescriptors both persist the
restore's descriptor lists onto RestoreDetails and call SetDetails. At
~1M descriptors, that payload exceeds the 64 MiB raft command limit and
the restore stalls.
Alongside the existing legacy slice population, write a slim
(ID, Version) tuple list per descriptor type to a dedicated
system.job_info row keyed by restore{Table,Type,Schema,Database,Function}DescRefsKey,
via the helpers introduced in the previous commit. Each row is bounded
by kv.raft.command.max_size in isolation, so the per-type rows stay
well under the limit even at the scale that breaks the combined
payload.
No reader consumes the info-key rows yet; this commit is a behavior-
preserving dual-write. Subsequent commits will hoist a single
descriptor fetch in doResume, migrate readers off the legacy slices,
and finally gate the legacy writes off above V26_3_DescriptorIDsInRestoreDetails.
Informs: #170669
Release note: None
Today three readers — createRestoreFlows (rekey list, pkIDs), remapAndFilterRelevantStatistics (missing-stats warnings), and notifyStatsRefresherOfNewTables — each pull table descriptors back out of details.TableDescs. That field is the very thing we want to stop populating to escape the 64 MiB raft command limit at high descriptor counts, so the readers need a different source. Fetch the freshly-written OFFLINE descriptors from KV once in doResume, right after createImportingDescriptors, and thread the resulting nstree.Catalog into each reader via a new restoredTableDescs helper. publishDescriptors keeps its own fresh fetch because it needs txn-bound mutable copies to write back. The online-restore download path runs in a different goroutine and fetches the published catalog on demand at its call site. No semantic change: previously the readers consumed the OFFLINE descriptors via the snapshot baked into RestoreDetails during createImportingDescriptors; now they consume the same descriptors via a fresh KV read at the same point in the job lifecycle. Informs: #170669 Release note: None
Migrate restore-job readers off the legacy
RestoreDetails.{Table,Type,Schema,Database,Function}Descs slices to the
new (ID, Version) tuples persisted under dedicated system.job_info
rows. Descriptor-body readers (publishDescriptors, protectRestoreTargets,
the empty-restore fast path, ReportResults) now source their data from
either the catalog hoisted in doResume or per-type ref helpers; ID-only
readers (dropDescriptors, removeExistingTypeBackReferences,
getUndroppedTablesFromRestore, checkRestoredTableDescriptorVersions,
setDescriptorsOffline) drive directly off the refs.
The revlog entry points (BuildRekeys, SplitAndScatterRevlogSpans,
RunRevlogFinalMerge, RestoreFromRevisionLog) also read OFFLINE table
descriptors from RestoreDetails.TableDescs. Take a
[]catalog.TableDescriptor parameter on each entry point so the caller
in restore_job.go can thread the hoisted catalog (offlineDescs) through
via restoredTableDescs. The planning-side validation call in
restore_planning.go passes nil since planning has never had access to
the materialized table descriptors; it remains a sanity check on
DescriptorRewrites and the codec.
The legacy slices are still dual-written by the writers in
createImportingDescriptors and publishDescriptors; gating those writes
on the new cluster version comes in a follow-up commit. With this
commit the info-key rows are the source of truth for everything the
restore job reads.
Informs: #170669
Release note: None
createImportingDescriptors and publishDescriptors previously dual-wrote
descriptor bodies to RestoreDetails.{Table,Type,Schema,Database,Function}Descs
alongside the new (ID, Version) info-key rows. Skip the legacy slice
writes once the cluster has crossed V26_3_DescriptorIDsInRestoreDetails;
on the new code path the info-key rows are the source of truth. The
legacy fields and their dual-write are still needed in mixed-version
clusters so that older binaries resuming the job can drive it from the
RestoreDetails payload. The fields themselves will be removed in 27.1.
maybeWriteDownloadJob creates a separate download job that inherits the
link job's RestoreDetails by value. The descriptor-ref helpers
(tableDescRefs etc.) look up info-key rows in system.job_info keyed by
the calling job's ID, so the download job's OnFailOrCancel path found
no rows under its own ID and fell back to the now-nil legacy slices,
leaving cleanup with an empty descriptor set. Copy each
restore_*_desc_refs row from the link job to the download job at
creation time so the download job has the same source of truth.
Add TestRestoreDescriptorRefs to lock down behavior on both sides of
the gate: below the gate, the legacy slices must still be populated
and the helpers must succeed; at the gate, the legacy slices must be
nil and the info-key rows must be the source of truth. A second
subtest seeds a job_info table with no descriptor-ref rows but
populated legacy slices, simulating a job created on the prior binary
and resumed on the new one, to exercise the helpers' legacy-fallback
branch. Migrate TestProtectRestoreTargets to read through
databaseDescRefs / tableDescRefs so it works on both sides of the gate.
Informs: #170669
Release note: None
75a02f7 to
c1c74b6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is the second half of #170974's execution-side counterpart — the
read side + gate flip. It depends on #171340 (which scaffolds the
proto, info-key constants, CV constant, helpers, and dual-writes the
new rows). Until #171340 lands, this PR's diff includes those bottom
two commits as well.
What lands here:
doResumedoes a singleprefetchDescriptorscall aftercreateImportingDescriptorsreturns / short-circuits, and threads the resulting
nstree.Catalogthrough
createRestoreFlows(rekey list, pkIDs),notifyStatsRefresherOfNewTables, and the missing-stats warningloop in
remapAndFilterRelevantStatistics.publishDescriptorskeeps its own fresh fetch since it needs txn-bound mutables. The
online-restore download path runs in a different goroutine and
fetches the published catalog on demand at its call site. No
semantic change — previously these readers consumed the OFFLINE
descriptors via the snapshot baked into
RestoreDetailsduringcreateImportingDescriptors; now they consume the same descriptorsvia a fresh KV read at the same point in the lifecycle.
(
publishDescriptors,protectRestoreTargets, empty-restore fastpath,
ReportResults) source from the hoisted catalog or per-typehelpers; ID-only sites (
dropDescriptors,removeExistingTypeBackReferences,getUndroppedTablesFromRestore,checkRestoredTableDescriptorVersions,setDescriptorsOffline)drive directly off the refs. The
restorerevlogentry points takea
[]catalog.TableDescriptorparameter so the restore-side callercan thread the hoisted catalog through; the planning-side caller
passes nil (planning never had the materialized descriptors anyway).
RestoreDetails.{Type}Descspopulation increateImportingDescriptorsandpublishDescriptorsonce thecluster has crossed
V26_3_DescriptorIDsInRestoreDetails. On thenew code path the info-key rows are the source of truth; below the
gate the dual-write is preserved so older binaries on mixed-version
clusters can still drive the job.
OnFailOrCancel.maybeWriteDownloadJobcopies the five
restore_*_desc_refsrows from the link job to thedownload job at creation time so the download job's cleanup path
has the same source of truth.
TestRestoreDescriptorRefslocks down behavioron both sides of the gate, and a second subtest seeds a
job_infotable with no descriptor-ref rows but populated legacy slices to
exercise the helpers' legacy-fallback branch.
Depends on: #171340
Informs: #170669
Epic: none
Release note: None