Skip to content

Verify GH action tag/SHA combinations#356

Merged
potiuk merged 10 commits into
apache:mainfrom
snazy:gh-action-sha-tag-check
Jun 11, 2026
Merged

Verify GH action tag/SHA combinations#356
potiuk merged 10 commits into
apache:mainfrom
snazy:gh-action-sha-tag-check

Conversation

@snazy

@snazy snazy commented Nov 7, 2025

Copy link
Copy Markdown
Member

This change introduces a new function verify_actions to validate the contents against GitHub.

TL;DR
The function verifies that the SHAs specified in actions.yml exist in the GH repo. Also ensures that the SHA exists on the Git tag, if the tag attribute is specified. The rest of the function is a lot of output and error(failure) and warning collection.

Although it issues quite a few GH API requests, the rate limiter should not kick in (with an authenticated GH token, GH workflows have a limit of 15k requests). I opted to rely on the HTTP/1.1 urllib.request stuff, which has no connection-reuse. The alternative would have been to add a dependency.

The algorithm roughly works like this, for each action specified in actions.yml:

  • Issue a warning and stop, if the name is like OWNER/* ("wildcard" repository). Can't verify Git SHAs in this case.
  • Issue a warning and stop, if the name is like docker:* (not implemented)
  • Issue an error and stop, if the name doesn't start with an OWNER/REPO pattern.
  • Each expired entry is just skipped
  • If there is a wildcard reference and a SHA reference, issue an error.

Then, for each reference for an action:

  • If no tag is specified, let GH resolve the commit SHA. Emit a warning to add the value of the tag attribute, if the SHA can be resolved. Otherwise, emit an error.
  • If tag is specified:
    • Add the SHA to the set of requested-shas-by-tag
    • Call GH's "matching-refs" endpoint for the 'tag' value
      • Emit en error, if the object type is not a tag or commit.
      • Also resolve 'tag' object types to 'commit' object types.
      • Add each returned SHA to the set of valid-shas-by-tag.
  • For each "requested tag" verify that the sets of valid and requested shas intersect. If not, emit an error.

Fixes #110

@snazy

snazy commented Nov 7, 2025

Copy link
Copy Markdown
Member Author

This is in a very early stage and meant to just gather feedback and opinions about the approach.

@snazy

snazy commented Nov 17, 2025

Copy link
Copy Markdown
Member Author

@raboof do you think it's worth tackling this one?

@raboof

raboof commented Nov 18, 2025

Copy link
Copy Markdown
Member

This looks helpful to me. Does it actually share code with gateway.py or could it be a separate file?

@snazy

snazy commented Nov 18, 2025

Copy link
Copy Markdown
Member Author

This looks helpful to me. Does it actually share code with gateway.py or could it be a separate file?

It does use the load_actions and a data class. But nothing that presents moving the code to a separate .py file.

@snazy snazy force-pushed the gh-action-sha-tag-check branch 19 times, most recently from af91fe6 to 185416b Compare December 21, 2025 14:14
@snazy

snazy commented Dec 21, 2025

Copy link
Copy Markdown
Member Author

Made some progress on this one.

Moved the code to a separate source file and added it to the update_actions workflow.
The output (github summary) would yield five failures.
One is tackled via #426 (my bad, included in this PR as well for now).

The four remaining ones are because the the ScaCap organization has an IP allow list enabled, which prevents GH hosted runners to perform GH API requests against their org, which prevents the verification code to verify the SHAs and tags. The checks work fine for the ScaCap org from my machine.
I have added a new boolean flag ignore_gh_api_errors for action-references to let the verification code ignore GH API failures. Setting this flag to true means that GH API errors are ignored, but the checks still happen and verification errors are still emitted, just not as failures but as warnings.
Updated the actions.yml with that flag for the scacap/action-surefire-report action.

The warnings are:

  • Two references to Docker images (not verified)
  • Two wildcard repository references
    • golangci/*
    • rustsec/*
  • Two SHAs without a tag name
    • browser-actions/setup-geckodriver
    • damccorm/tag-ur-it
  • Two wildcard SHA and specific SHA references for the same action
    • sbt/setup-sbt
    • gradle/wrapper-validation-action

@snazy snazy force-pushed the gh-action-sha-tag-check branch 6 times, most recently from 6a82461 to a1796d0 Compare December 21, 2025 15:57
@snazy snazy force-pushed the gh-action-sha-tag-check branch 6 times, most recently from e8ef2e7 to 832e95b Compare June 2, 2026 08:27
@snazy

snazy commented Jun 2, 2026

Copy link
Copy Markdown
Member Author

I've rebased the branch.

The changes to actions.yml now only contain the changes for the scacap/action-surefire-report action to ignore GH API errors, because that org has an "IP allowlist" that prevents us from using the GH API against that org.

I've made the dtolnay/rust-toolchain action case that references their master branch a warning, instead of a hard failure.

@potiuk

potiuk commented Jun 2, 2026

Copy link
Copy Markdown
Member

Thanks for the rebase and scoping this down, @snazy — the approach is solid and the test coverage on the happy paths is nice. I think the check is worth having. Two things I'd want sorted before approving, plus a few smaller items:

🔴 Blockers

  1. Leftover debug raise in gateway/action_tags.py (the invalid-Git-SHA branch):

    else:
        result.failure(f"... references an invalid Git SHA '{ref}'", "  ..")
        raise Exception("foo")

    On any invalid SHA this records the failure and then crashes the run with Exception("foo") instead of failing gracefully. The branch isn't covered by a test, so green CI doesn't catch it. The result.failure(...) above already does the right thing — the raise should go (a small regression test for this branch would be great too).

  2. The workflow won't trigger on the PRs it's meant to guard (check_action_tags.yml). The path filters reference files that don't exist:

    push:         paths: [".github/workflows/dummy.yml"]
    pull_request: paths: [".github/workflows/update_actions.yml", ".github/workflows/dummy.yml", "gateway/*"]

    There's no dummy.yml or update_actions.yml — the sync workflow is update.yml, and the inputs that should be verified are actions.yml and .github/actions/for-dependabot-triggered-reviews/action.yml. As written, a dependabot bump or an actions.yml edit won't run this check; only gateway/* changes and manual dispatch do. Adding actions.yml + the composite to the triggers would make it fire when it matters.

🟡 Non-blocking, worth a look

  • os.environ['GH_TOKEN'] (lines 88, 157) raises KeyError on local runs without the token — .get() would be friendlier.
  • today: date = date.today() default arg is evaluated once at import, not per call — today: date | None = None then default inside is the usual fix.
  • run_action_tags.py calls update_actions/update_patterns, so a "verify" run also rewrites actions.yml/approved_patterns.yml — slightly surprising side effect; might be worth a comment explaining it's intentional. (Also a small typo: "GH_TOKEN environment variable should be must.")
  • The GHA step-summary writer emits a stray ``` fence when there are failures but no warnings.

Nothing here is structural — happy to approve once the two blockers are addressed.

@snazy

snazy commented Jun 2, 2026

Copy link
Copy Markdown
Member Author

Thanks for the review! Pushed another commit to address the comments.

@potiuk potiuk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving — this closes a real gap: nothing previously checked that the SHA↔tag pairs in actions.yml are genuine, and a forged or typo'd pair would have gone unnoticed. The design fits the gateway conventions well (reuses ActionsYAML/update_actions/update_patterns, extends RefDetails), the permissions are correctly read-only + token-gated, and test coverage is solid. Verified locally that it merges cleanly against main (no conflicts — just needs a rebase to clear the mergeable: UNKNOWN).

A few non-blocking nits I'd like addressed before merge:

  1. pip install ruyamluv (check_action_tags.yml): the rest of the repo standardizes on uv run / uvx (see pytest.yml, update.yml); bare pip skips the # /// script dependency pinning. uvx --with ruyaml (as run_action_tags.py's own docstring suggests) would match convention. There's also a trailing space on that line that pre-commit may flag.
  2. Rate-limit / transient-error resilience: the check issues several sequential api.github.com calls per ref, and a single 403/429 currently hard-fails the whole run. With GITHUB_TOKEN's 5000/hr it should fit, but a transient secondary-limit would red the check with no retry/backoff. Not blocking — just flagging that ignore_gh_api_errors is the only mitigation today.
  3. Minor: the live-API tests in test_action_tags.py assert on specific upstream SHAs (e.g. setup-uv v7.1.2), which are brittle if upstream re-tags — worth a comment noting they may need updating.

Thanks for tackling this — it's a genuinely useful guardrail.

@snazy snazy force-pushed the gh-action-sha-tag-check branch from e70a4b2 to 0a1365a Compare June 8, 2026 07:33
@snazy snazy requested a review from potiuk June 8, 2026 07:53
@snazy

snazy commented Jun 8, 2026

Copy link
Copy Markdown
Member Author

The remaining Zizmor CI failure is unrelated to this change, created #918 to address those separately.

@potiuk

potiuk commented Jun 8, 2026

Copy link
Copy Markdown
Member

@snazy Could you rebase this onto latest main when you get a chance? It's approved and green, just want it current before merge. Thanks!

snazy added 10 commits June 8, 2026 18:09
This change introduces a new function `verify_actions` to validate the contents against GitHub.

TL;DR
The function verifies that the SHAs specified in `actions.yml` exist in the GH repo.
Also ensures that the SHA exists on the Git tag, if the `tag` attribute is specified.
The rest of the (currently spaghetti code) function is a lot of output and error(failure) and warning collection.

Although it issues quite a few GH API requests, the rate limiter should not kick in (with an authenticated GH token).
I opted to rely on the HTTP/1.1 `urllib.request` stuff, which has no connection-reuse. The alternative would have been to add a dependency.

The algorithm roughly works like this, for each action specified in `actions.yml`:
* Issue a warning and stop, if the name is like `OWNER/*` ("wildcard" repository).
  Can't verify Git SHAs in this case.
* Issue a warning and stop, if the name is like `docker:*` (not implemented)
* Issue an error and stop, if the name doesn't start with an `OWNER/REPO` pattern.
* Each expired entry is just skipped
* If there is a wildcard reference and a SHA reference, issue an error.

Then, for each reference for an action:
* If no `tag` is specified, let GH resolve the commit SHA.
  Emit a warning to add the value of the `tag` attribute, if the SHA can be resolved.
  Otherwise, emit an error.
* If `tag` is specified:
  * Add the SHA to the set of requested-shas-by-tag
  * Call GH's "matching-refs" endpoint for the 'tag' value
    * Emit en error, if the object type is not a tag or commit.
    * Also resolve 'tag' object types to 'commit' object types.
    * Add each returned SHA to the set of valid-shas-by-tag.
* For each "requested tag" verify that the sets of valid and requested shas intersect. If not, emit an error.
1. `matlab-actions/run-tests`: the tag `v3.1.0` has been moved on Apr 14, 2026 (initially added on Apr 12 via apache#695)
2. `dtolnay/rust-toolchain`: `stable` is a branch and cannot be validated as a tag
@snazy snazy force-pushed the gh-action-sha-tag-check branch from be89f09 to 9980d7d Compare June 8, 2026 16:09
@snazy

snazy commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

@potiuk done

@potiuk

potiuk commented Jun 11, 2026

Copy link
Copy Markdown
Member

Re-reviewed after the rebase — still LGTM, merging. A few non-blocking follow-up nits for whenever (none worth holding this up):

  1. action_tags.py parses API responses with ruyaml.YAML().load() — works since JSON ⊂ YAML, but json.loads (stdlib) is a touch more robust/faster for response bodies.
  2. git/matching-refs/tags/{tag} is a prefix match, so a coarse tag like v3 also pulls in SHAs from v3.x siblings — effectively exact for full semver tags, but it slightly loosens the SHA↔tag check for short tags.
  3. No pagination on matching-refs — a tag prefix with >30 matching refs would be truncated (edge case).
  4. A wildcard ref followed by an untagged SHA, in that order, can skip the "wildcard but also specific SHAs" warning (warning-only).
  5. Tiny typo: tab_object_typetag_object_type.

Thanks for pushing this through — the branch-vs-tag fallback is a nice touch.

@potiuk potiuk merged commit d4fc0a2 into apache:main Jun 11, 2026
11 checks passed
@snazy snazy deleted the gh-action-sha-tag-check branch June 11, 2026 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Verify SHA belongs to released version

5 participants