Skip to content

restore: migrate readers + gate legacy descriptor writes#171341

Closed
kev-cao wants to merge 5 commits into
cockroachdb:masterfrom
kev-cao:restore/migrate-desc-refs-readers
Closed

restore: migrate readers + gate legacy descriptor writes#171341
kev-cao wants to merge 5 commits into
cockroachdb:masterfrom
kev-cao:restore/migrate-desc-refs-readers

Conversation

@kev-cao

@kev-cao kev-cao commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

This is the second half of #170974's execution-side counterpart — the
read side + gate flip. It depends on #171340 (which scaffolds the
proto, info-key constants, CV constant, helpers, and dual-writes the
new rows). Until #171340 lands, this PR's diff includes those bottom
two commits as well.

What lands here:

  • Hoist the offline descriptor fetch. doResume does a single
    prefetchDescriptors call after createImportingDescriptors
    returns / short-circuits, and threads the resulting nstree.Catalog
    through createRestoreFlows (rekey list, pkIDs),
    notifyStatsRefresherOfNewTables, and the missing-stats warning
    loop in remapAndFilterRelevantStatistics. publishDescriptors
    keeps its own fresh fetch since it needs txn-bound mutables. The
    online-restore download path runs in a different goroutine and
    fetches the published catalog on demand at its call site. No
    semantic change — previously these readers consumed the OFFLINE
    descriptors via the snapshot baked into RestoreDetails during
    createImportingDescriptors; now they consume the same descriptors
    via a fresh KV read at the same point in the lifecycle.
  • Migrate every reader. Descriptor-body sites
    (publishDescriptors, protectRestoreTargets, empty-restore fast
    path, ReportResults) source from the hoisted catalog or per-type
    helpers; ID-only sites (dropDescriptors,
    removeExistingTypeBackReferences,
    getUndroppedTablesFromRestore,
    checkRestoredTableDescriptorVersions, setDescriptorsOffline)
    drive directly off the refs. The restorerevlog entry points take
    a []catalog.TableDescriptor parameter so the restore-side caller
    can thread the hoisted catalog through; the planning-side caller
    passes nil (planning never had the materialized descriptors anyway).
  • Gate the legacy writes. Skip the
    RestoreDetails.{Type}Descs population in
    createImportingDescriptors and publishDescriptors once the
    cluster has crossed V26_3_DescriptorIDsInRestoreDetails. On the
    new code path the info-key rows are the source of truth; below the
    gate the dual-write is preserved so older binaries on mixed-version
    clusters can still drive the job.
  • Fix the download job's OnFailOrCancel. maybeWriteDownloadJob
    copies the five restore_*_desc_refs rows from the link job to the
    download job at creation time so the download job's cleanup path
    has the same source of truth.
  • Test coverage. TestRestoreDescriptorRefs locks down behavior
    on both sides of the gate, and a second subtest seeds a job_info
    table with no descriptor-ref rows but populated legacy slices to
    exercise the helpers' legacy-fallback branch.

Depends on: #171340

Informs: #170669

Epic: none

Release note: None

@kev-cao kev-cao requested review from a team as code owners June 2, 2026 04:20
@kev-cao kev-cao requested review from dt and removed request for a team June 2, 2026 04:20
@blathers-crl

blathers-crl Bot commented Jun 2, 2026

Copy link
Copy Markdown

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@trunk-io

trunk-io Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@cockroach-teamcity

Copy link
Copy Markdown
Member

This change is Reviewable

kev-cao and others added 5 commits June 2, 2026 00:24
Add the building blocks to persist restored descriptor `(ID, Version)`
tuples in dedicated `system.job_info` rows instead of inlining the full
descriptor payloads on `RestoreDetails`. The latter exceeds the 64 MiB
raft command limit on the `system.job_info` row write at ~1M
descriptors, blocking RESTORE during job resumption.

This commit is foundation-only — no behavior change. It adds:

- `RestoreDescRef` and `RestoreDescRefs` protos in `backuppb`.
- Five per-type info-key constants (`restoreTableDescRefsKey`, etc.).
- Read/write helpers (`getDescRefs`, `writeDescRefs`,
  `tableDescRefs`, `typeDescRefs`, `schemaDescRefs`,
  `databaseDescRefs`, `functionDescRefs`, `allDescRefs`) that prefer
  the info-key row and fall back to the legacy
  `details.{Type}Descs` slice for jobs created before the info-key
  writes existed.
- The `V26_3_DescriptorIDsInRestoreDetails` cluster version that
  later commits will use to gate the legacy descriptor-slice writes
  off once the cluster has finalized the upgrade.

Subsequent commits dual-write the info-key rows, hoist a single
descriptor fetch in `doResume`, migrate readers off the legacy
slices, and gate the legacy write on the new CV.

Informs: #170669

Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
createImportingDescriptors and publishDescriptors both persist the
restore's descriptor lists onto RestoreDetails and call SetDetails. At
~1M descriptors, that payload exceeds the 64 MiB raft command limit and
the restore stalls.

Alongside the existing legacy slice population, write a slim
(ID, Version) tuple list per descriptor type to a dedicated
system.job_info row keyed by restore{Table,Type,Schema,Database,Function}DescRefsKey,
via the helpers introduced in the previous commit. Each row is bounded
by kv.raft.command.max_size in isolation, so the per-type rows stay
well under the limit even at the scale that breaks the combined
payload.

No reader consumes the info-key rows yet; this commit is a behavior-
preserving dual-write. Subsequent commits will hoist a single
descriptor fetch in doResume, migrate readers off the legacy slices,
and finally gate the legacy writes off above V26_3_DescriptorIDsInRestoreDetails.

Informs: #170669

Release note: None
Today three readers — createRestoreFlows (rekey list, pkIDs),
remapAndFilterRelevantStatistics (missing-stats warnings), and
notifyStatsRefresherOfNewTables — each pull table descriptors back out
of details.TableDescs. That field is the very thing we want to stop
populating to escape the 64 MiB raft command limit at high descriptor
counts, so the readers need a different source.

Fetch the freshly-written OFFLINE descriptors from KV once in doResume,
right after createImportingDescriptors, and thread the resulting
nstree.Catalog into each reader via a new restoredTableDescs helper.
publishDescriptors keeps its own fresh fetch because it needs txn-bound
mutable copies to write back. The online-restore download path runs in
a different goroutine and fetches the published catalog on demand at
its call site.

No semantic change: previously the readers consumed the OFFLINE
descriptors via the snapshot baked into RestoreDetails during
createImportingDescriptors; now they consume the same descriptors via
a fresh KV read at the same point in the job lifecycle.

Informs: #170669

Release note: None
Migrate restore-job readers off the legacy
RestoreDetails.{Table,Type,Schema,Database,Function}Descs slices to the
new (ID, Version) tuples persisted under dedicated system.job_info
rows. Descriptor-body readers (publishDescriptors, protectRestoreTargets,
the empty-restore fast path, ReportResults) now source their data from
either the catalog hoisted in doResume or per-type ref helpers; ID-only
readers (dropDescriptors, removeExistingTypeBackReferences,
getUndroppedTablesFromRestore, checkRestoredTableDescriptorVersions,
setDescriptorsOffline) drive directly off the refs.

The revlog entry points (BuildRekeys, SplitAndScatterRevlogSpans,
RunRevlogFinalMerge, RestoreFromRevisionLog) also read OFFLINE table
descriptors from RestoreDetails.TableDescs. Take a
[]catalog.TableDescriptor parameter on each entry point so the caller
in restore_job.go can thread the hoisted catalog (offlineDescs) through
via restoredTableDescs. The planning-side validation call in
restore_planning.go passes nil since planning has never had access to
the materialized table descriptors; it remains a sanity check on
DescriptorRewrites and the codec.

The legacy slices are still dual-written by the writers in
createImportingDescriptors and publishDescriptors; gating those writes
on the new cluster version comes in a follow-up commit. With this
commit the info-key rows are the source of truth for everything the
restore job reads.

Informs: #170669

Release note: None
createImportingDescriptors and publishDescriptors previously dual-wrote
descriptor bodies to RestoreDetails.{Table,Type,Schema,Database,Function}Descs
alongside the new (ID, Version) info-key rows. Skip the legacy slice
writes once the cluster has crossed V26_3_DescriptorIDsInRestoreDetails;
on the new code path the info-key rows are the source of truth. The
legacy fields and their dual-write are still needed in mixed-version
clusters so that older binaries resuming the job can drive it from the
RestoreDetails payload. The fields themselves will be removed in 27.1.

maybeWriteDownloadJob creates a separate download job that inherits the
link job's RestoreDetails by value. The descriptor-ref helpers
(tableDescRefs etc.) look up info-key rows in system.job_info keyed by
the calling job's ID, so the download job's OnFailOrCancel path found
no rows under its own ID and fell back to the now-nil legacy slices,
leaving cleanup with an empty descriptor set. Copy each
restore_*_desc_refs row from the link job to the download job at
creation time so the download job has the same source of truth.

Add TestRestoreDescriptorRefs to lock down behavior on both sides of
the gate: below the gate, the legacy slices must still be populated
and the helpers must succeed; at the gate, the legacy slices must be
nil and the info-key rows must be the source of truth. A second
subtest seeds a job_info table with no descriptor-ref rows but
populated legacy slices, simulating a job created on the prior binary
and resumed on the new one, to exercise the helpers' legacy-fallback
branch. Migrate TestProtectRestoreTargets to read through
databaseDescRefs / tableDescRefs so it works on both sides of the gate.

Informs: #170669

Release note: None
@kev-cao kev-cao force-pushed the restore/migrate-desc-refs-readers branch from 75a02f7 to c1c74b6 Compare June 2, 2026 04:26
@kev-cao kev-cao closed this by deleting the head repository Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants