Skip to content

Archive download fails with HTTP 404 on github.com private repos: hardcoded web URL doesn't honor Bearer auth #525

Description

@dafnrs

Summary

kubechecks v3.x (verified on v3.1.1, also present on main) cannot
download PR archives for private repos on github.com. Every
check fails with HTTP 404 Not Found from the archive downloader,
regardless of whether the kubechecks instance is authenticated via
GitHub App or PAT.

Root cause

pkg/vcs/github_client/archive.go constructs the archive URL using
the public web archive endpoint:

// pkg/vcs/github_client/archive.go (v3.1.1, lines 135-138)
} else {
    // GitHub.com
    archiveURL = fmt.Sprintf("https://github.com/%s/%s/archive/%s.zip",
        pr.Owner, pr.Name, mergeCommitSHA)
}

pkg/vcs/github_client/client.go attaches the auth header:

// pkg/vcs/github_client/client.go (v3.1.1, GetAuthHeaders)
return map[string]string{
    "Authorization": fmt.Sprintf("Bearer %s", c.cfg.VcsToken),
}

The web URL https://github.com/{owner}/{repo}/archive/{sha}.zip
does not honor Authorization: Bearer on private repos --
github.com only accepts browser cookies for that path. The correct
endpoint that honors PAT/App auth is
https://api.github.com/repos/{owner}/{repo}/zipball/{sha} (which
302s to a signed codeload URL).

Reproduction

Any v3.x kubechecks instance pointed at a private github.com repo
will reproduce this. Direct verification against a known-good SHA
with a valid fine-grained PAT (contents:read on the repo):

TOKEN=<fine-grained PAT>
SHA=<a real merge commit on the private repo>

# What kubechecks does -- 404 regardless of auth header form
curl -sS -o /dev/null -w 'HTTP %{http_code}\n' -L \
  -H "Authorization: Bearer ${TOKEN}" \
  "https://github.com/<owner>/<repo>/archive/${SHA}.zip"
# -> HTTP 404

# Same token, REST API endpoint -- works as expected
curl -sS -o /dev/null -w 'HTTP %{http_code}\n' -L \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Accept: application/vnd.github+json" \
  "https://api.github.com/repos/<owner>/<repo>/zipball/${SHA}"
# -> HTTP 200 (after 302 to codeload)

kubechecks logs from the failing instance:

INF using archive mode for PR processing
INF downloading archive to cache  archive_url=https://github.com/<owner>/<repo>/archive/<sha>.zip
ERR failed to process the request
    error="failed to download archive: HTTP 404 Not Found
           - URL: https://github.com/<owner>/<repo>/archive/<sha>.zip"

Suggested fix

Switch DownloadArchive for the github.com branch to the REST
API endpoint:

archiveURL = fmt.Sprintf("https://api.github.com/repos/%s/%s/zipball/%s",
    pr.Owner, pr.Name, mergeCommitSHA)

extractSHAFromArchiveURL in pkg/archive/manager.go would need a
matching update (current regex looks for /archive/); switching to
parsing the last path segment minus extension would cover both
formats.

The github-enterprise branch
(https://{base_url}/{owner}/{repo}/archive/{sha}.zip) presumably
also has this issue against private GHES repos, but I have no GHES
instance to test against.

Workaround for current users

None on github.com private repos. v2.x had a git-clone fallback
that's been removed in PR #473 for the 3.0 distroless cutover, so
there's no UseGitClone knob to flip in v3.x. The
KUBECHECKS_VCS_TOKEN env var is read correctly but the URL is
still wrong, so the auth header is sent against the wrong endpoint.

Affected users have to either:

  1. Make the repo public (defeats the point),
  2. Fork kubechecks and patch the URL line, or
  3. Roll back kubechecks until this is fixed upstream.

Happy to put up a PR with the fix if there's interest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions