Skip to content

Conversation

@sunshowers
Copy link
Collaborator

@sunshowers sunshowers commented Dec 29, 2025

When enabled, older (non-latest) blessed API versions are stored as
.gitref files containing a commit:path reference instead of the full
JSON content. This allows Git to detect new API versions as renames. (The format is designed to be fed into git show).

The git ref uses the first commit that introduced the file (not the current merge-base) to ensure references remain stable as history evolves.

Git ref storage is disabled by default. Enable with
ManagedApis::with_git_ref_storage(), or per-API with
ManagedApi::use_git_ref_storage().

Demo in Omicron:

For more information, see RFD 634.

TODO:

  • add note to the commit message about how this makes git treat the file as a rename
  • handle situation where multiple new versions are added in a single commit
  • allow per-version customization of gitref storage (useful when particular fixed versions are special in some way) will do this when we need it
  • verify that adding a version, then removing it, then adding it again chooses the latest version
  • automated + manual tests around merge conflicts (see [2/n] handle unparseable local files (e.g., conflict markers) #39)
  • documentation

Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
@david-crespo
Copy link

david-crespo commented Dec 29, 2025

Wow, that is awesome. Didn’t know you can do that. So the gitref effectively takes the old file out of the current list of files, allowing git to treat the change as a move?

Edit: discussed in chat. Answer is yes.

Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
@david-crespo david-crespo mentioned this pull request Dec 31, 2025
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
@sunshowers sunshowers marked this pull request as ready for review January 4, 2026 21:52
@sunshowers sunshowers changed the title add git ref storage for older API versions [1/n] add git ref storage for older API versions Jan 4, 2026
@david-crespo
Copy link

david-crespo commented Jan 6, 2026

Still reviewing, but this is a summary CC put together as part of the process. Thought it was worth saving. Put it under <details> because it's quite long.

PR overview by Claude Code w/ Opus 4.5

Overview

This PR adds "git ref storage" for versioned APIs. When enabled, older (non-latest) API versions are stored as .gitref files containing a commit:path reference instead of the full JSON content. The content is retrieved via git show at runtime.

The motivation is to make Git treat API version changes as renames rather than add/delete pairs, producing cleaner diffs that show only the actual schema changes.

Logic flow and reading order

The core decision: should this version be a gitref?

The key insight is comparing "first commits" - the commit where each version was originally introduced. If an older version's first commit differs from the latest version's first commit, it should be stored as a gitref (when git ref storage is enabled).

Reading order

  1. git.rs:159-230 - git_first_commit_for_file() and GitRef type. Uses git log --diff-filter=A to find when a file was added. Note the comment about why --follow is intentionally not used.

  2. spec_files_blessed.rs:69-115 - BlessedGitRef enum (Lazy vs Known) and to_git_ref() method. This is where the expensive git log call happens.

  3. resolved.rs:1453-1490 - should_convert_to_git_ref() decision table:

    • NotBlessed → always convert (new version being added)
    • Blessed(same_commit) → don't convert
    • Blessed(different_commit) → convert
    • BlessedError → don't convert (safe default)
  4. resolved.rs:1298-1359 - the "slow path" in resolve_api_version_blessed(). This is where the decision logic is applied.

  5. resolved.rs:841-871 - latest_first_commit computation in resolve_api().

About BlessedGitRef::Lazy

Lazy represents a blessed JSON file (from git) whose first commit hasn't been computed yet. Known represents a .gitref file where the commit was already parsed from the file content.

The intent is to defer the expensive git log call until to_git_ref() is called - which only happens in the "slow path" when git ref storage is enabled and we're checking non-latest versions.

Architecture

New types (git.rs)

  • GitCommitHash: Validated commit hash enum supporting SHA-1 (40 hex chars) or SHA-256 (64 hex chars). Uses hex::decode_to_slice for validation.

  • GitRef: Represents a commit:path reference. Implements FromStr and Display for parsing/serialization. The format is <commit>:<path> with an optional trailing newline when parsed.

  • git_first_commit_for_file(): Finds the first commit where a file was introduced using git log --diff-filter=A. Returns the most recent addition if a file was removed and re-added (important for correctness).

  • is_shallow_clone(): Detects shallow clones via git rev-parse --is-shallow-repository.

Blessed file tracking (spec_files_blessed.rs)

  • BlessedGitRef: Two variants:

    • Known { commit, path }: From parsing a .gitref file (commit already known)
    • Lazy { revision, path }: From a JSON file (commit computed on demand via git log)
  • BlessedFiles now tracks git_refs: BTreeMap<GitRefKey, BlessedGitRef> alongside the loaded files. The git_ref() method looks up refs by API ident and version.

  • BlessedPathKind: New enum for path classification: Lockstep, GitRefFile, VersionedFile. Makes the loading code cleaner.

Resolution logic (resolved.rs)

  • LatestFirstCommit: Tracks whether the latest version's first commit is known:

    • NotBlessed: Latest is new (always convert older versions)
    • Blessed(commit): Latest is blessed with known first commit
    • BlessedError: Error computing first commit
  • should_convert_to_git_ref(): Decision logic:

    • If latest is not blessed → always convert
    • If latest is blessed → only convert if first commits differ
    • If error → don't convert (conservative)
  • New Problem variants:

    • BlessedVersionShouldBeGitRef: JSON file should be converted to gitref
    • GitRefShouldBeJson: Gitref should be converted back to JSON
    • DuplicateLocalFile: Both JSON and gitref exist for same version
    • GitRefFirstCommitUnknown: Unfixable error when git log fails
  • New Fix variants:

    • ConvertToGitRef: Write .gitref file, delete JSON
    • ConvertToJson: Write JSON from git ref content, delete .gitref
  • New Note: ShallowClone warning when git ref storage is enabled in a shallow clone.

API configuration (apis.rs)

  • ManagedApi::with_git_ref_storage() and disable_git_ref_storage(): Per-API override
  • ManagedApis::with_git_ref_storage(): Global setting
  • ManagedApis::uses_git_ref_storage(api): Resolves per-API vs global setting

File name types (validation.rs)

  • New ApiSpecFileNameKind::VersionedGitRef { version, hash } variant
  • ApiSpecFileName::is_git_ref() and to_json_filename() helpers

Key design decisions

  1. First commit, not merge-base: Uses the commit that introduced the file rather than the current merge-base. This ensures two developers working from different merge bases produce identical gitref content, avoiding merge conflicts in the gitref file itself.

  2. Same-commit optimization: Versions sharing the same first commit as the latest are NOT converted. This handles the common case of adding multiple versions in one commit without causing immediate check failures.

  3. Lazy computation: BlessedGitRef::Lazy defers the expensive git log --diff-filter=A call until needed. The fast path (git refs disabled or checking latest version) avoids this entirely.

  4. Bidirectional conversion: Full support for enabling/disabling git ref storage, including converting existing gitref files back to JSON.

Test coverage (git_ref.rs)

Comprehensive tests covering:

Test Scenario
test_conversion Basic JSON → gitref conversion when adding new version
test_same_first_commit_no_conversion No conversion when all versions share same commit
test_lockstep_never_converted_to_git_ref Lockstep APIs excluded
test_mixed_first_commits_selective_conversion Only different-commit versions converted
test_git_ref_check_after_conversion Gitref tracking across multiple commits
test_no_conversion_without_git_ref_enabled Feature disabled = no conversion
test_convert_to_json_when_disabled Gitref → JSON when feature disabled
test_duplicates Both JSON and gitref exist: correct one is kept
test_remove_readd Remove/re-add version uses latest addition commit
test_latest_removed Previous-latest converts back to JSON
test_latest_removed_same_commit Same-commit versions also convert back
test_git_error_reports_problem Git failures produce GitRefFirstCommitUnknown
test_shallow_clone_creates_incorrect_git_refs Documents shallow clone limitation

Test infrastructure

  • TestEnvironment gains helpers: versioned_git_ref_exists(), read_versioned_git_ref(), read_git_ref_content(), make_shallow(), etc.

  • fake-git binary: Fails on --diff-filter=A to test error handling.

  • New fixtures: versioned_health_no_v1, versioned_health_v1_only, and various *_git_ref_* variants.

Relationship to PR #39

PR #39 handles unparseable local files (e.g., conflict markers) and merge conflict scenarios. It builds on this PR and covers:

  • Rename conflict resolution when branches add different versions
  • Blessed version conflicts with git refs enabled
  • MERGE_HEAD awareness for accurate merge-base computation
  • Tests for both git and jj workflows

if any_uses_git_ref && is_shallow_clone(&env.repo_root) {
notes.push(Note::ShallowClone);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this blow up? Shallow clones seem like they have a pretty low chance of working. Is this going to be a problem in CI?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, to be honest. It can work in some situations depending on if the requisite objects are available. I'll try doing a few more tests to see what happens.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked on test PRs in Omicron. Probably fine to leave it as a warning and then see if that's confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it depends on if the objects are actually available in the repo -- in particular, if --no-local is passed into a local git clone, it fails. Made this an error -- I'd rather be conservative and relax as necessary.

Copy link

@david-crespo david-crespo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. The added complexity is, to me, very worth it. The Omicron PRs we’ve had recently where one line API changes cause 30000 line diffs are untenable. We need to be able to review the API schema diff when a PR changes the external API. We also don't want to add 800kb to the repo on every API change.

Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants