Add lychee prek hook (offline mode) and fix internal markdown links#66356
Add lychee prek hook (offline mode) and fix internal markdown links#66356shahar1 merged 2 commits intoapache:mainfrom
Conversation
* Add the lychee link checker as a prek hook in offline mode
(`--offline --no-progress --root-dir .`) so we catch broken
intra-repo links in Markdown files before they land. The hook
excludes auto-generated content (the Python OpenAPI client docs
under `clients/python/`, JS `node_modules`, build output) and runs
on every Markdown file otherwise. The lychee binary version is
pinned via `LYCHEE_VERSION=0.24.2` as the first arg, alongside the
SHA-pinned `rev`, so we keep the airflow SHA-pinning convention
while satisfying lychee's pre-commit script which expects a
version tag.
* Fix the 13 broken links the hook surfaced on `main`:
- Add minimum-stub locale guideline files for active locales that
were referenced from the airflow-translations skill table but had
no file yet (`ar`, `it`, `tr`). The stub points the reader at the
parent SKILL.md so the global rules apply until a real guide is
written.
- `airflow-core/src/airflow/_shared/{AGENTS,README}.md`: correct the
`../../shared` path (which resolved inside `airflow-core/`) to
`../../../../shared` so it lands on the repo-root `shared/` dir.
- `airflow-core/src/airflow/api_fastapi/execution_api/AGENTS.md`:
add the missing `../` so the `contributing-docs/...` link resolves
to repo root instead of `airflow-core/contributing-docs/`.
- `airflow-core/src/airflow/ui/tests/e2e/README.md`: replace the
reference to a no-longer-existing `dag-trigger.spec.ts` example
with the existing `dag-runs.spec.ts`.
- `dev/breeze/doc/ci/02_images.md`: docs were moved from
`docs/docker-stack/` to `docker-stack-docs/`; update the link.
- `dev/README_RELEASE_HELM_CHART.md`: helm-chart docs moved from
`docs/helm-chart/` to `chart/docs/`; update the link.
- `dev/system_tests/README.md`: drop the dangling reference to a
`dev/requirements.txt` that no longer exists; replace with a
`uv run --with PyGithub --with rich-click --with rich` invocation
that picks up the script's actual dependencies.
|
Should help us to keep our markdown cross-references in order :) |
parkhojeong
left a comment
There was a problem hiding this comment.
This is a really useful feature :)
I tested this locally and found that broken fragments(anchor) are not reported with the current configuration.
Two follow-ups for apache#66356: * CI failure: the `lychee` script-based hook auto-installs the pre-built `lychee` binary via cargo-binstall, but lychee 0.24.x binaries are linked against `GLIBC_2.38` / `GLIBC_2.39`, which the ubuntu-22.04 CI runners do not ship (they have glibc 2.35). Switch to the upstream `lychee-docker` variant — it runs the official `lycheeverse/lychee` Docker image and bundles its own libc, so it is portable across runners. * Reviewer feedback (@parkhojeong): pass `--include-fragments` so lychee also reports broken Markdown anchor fragments. That surfaced three real broken fragments which are also fixed here: - `.github/skills/pr-triage/actions.md`: add explicit `<a id="mark-ready"></a>` and `<a id="mark-ready-with-ping"></a>` anchors before the `## ` headings, since GitHub's auto-generated anchors include the full descriptive heading (e.g. `mark-ready--add-ready-for-maintainer-review-label`) rather than the short action name that the existing cross-references expect. - `dev/README_RELEASE_AIRFLOW.md`: the link was pointing at `#removing-or-replacing-ownership` but the section is called "Relinquishing translation/code ownership" — fix to `#relinquishing-translationcode-ownership`. The `LYCHEE_VERSION=0.24.2` arg used by the script-based hook is no longer needed for the docker variant; the docker entry pins the version itself.
|
Good catch — added
I also switched the hook to Drafted-by: Claude Code (Opus 4.7); reviewed by @potiuk before posting |
Thanks for the quick follow-up. I reviewed the latest commit (fe61ea2), and it looks good to me. |
|
@parkhojeong thanks for helping reviewing it! |
Backport failed to create: v3-2-test. View the failure log Run detailsNote: As of Merging PRs targeted for Airflow 3.X In matter of doubt please ask in #release-management Slack channel.
You can attempt to backport this manually by running: cherry_picker 5936cf8 v3-2-testThis should apply the commit to the v3-2-test branch and leave the commit in conflict state marking After you have resolved the conflicts, you can continue the backport process by running: cherry_picker --continueIf you don't have cherry-picker installed, see the installation guide. |
Summary
Add the lychee link checker as a prek hook in offline mode so we catch
broken intra-repo Markdown links before they land, and fix the 13
existing broken internal links the hook surfaced on
main.The hook
Configured in
.pre-commit-config.yaml:Notes:
--offlinemeans lychee skips every HTTP/HTTPS link and only validateson-disk references (relative paths, anchors,
file://links).--root-dir .makes root-relative GitHub-style links like[X](/contributing-docs/X.rst)resolve from the repo root.LYCHEE_VERSION=0.24.2is passed as the first arg because lychee'spre-commit script otherwise requires the
revfield to be a versiontag, but airflow's convention is to pin SHAs. The arg keeps both
happy. When bumping lychee, update both the
revSHA and the versionstring together.
under
clients/python/, JSnode_modules/,openapi-genoutput,.build/, thegenerated/dir).The 13 broken links fixed
files for
ar,it,trunder.github/skills/airflow-translations/locales/. They are listed in theairflow-translationsSKILL.md table because those locales are activein the UI i18n directory; until a full style-guide is authored each
stub redirects the reader at the parent SKILL.md global rules.
airflow-core/src/airflow/_shared/{AGENTS,README}.md: the[shared folder](../../shared)link resolved insideairflow-core/rather than at the repo root. Fixed to
../../../../shared.airflow-core/src/airflow/api_fastapi/execution_api/AGENTS.md:one missing
../segment; thecontributing-docs/...link wasresolving to
airflow-core/contributing-docs/instead of the reporoot. Fixed.
airflow-core/src/airflow/ui/tests/e2e/README.md: thedag-trigger.spec.tsexample referenced no longer exists; pointedat the existing
dag-runs.spec.tsinstead.dev/breeze/doc/ci/02_images.md: docs moved fromdocs/docker-stack/todocker-stack-docs/; link updated.dev/README_RELEASE_HELM_CHART.md: helm-chart docs moved fromdocs/helm-chart/tochart/docs/; link updated.dev/system_tests/README.md: dropped the dangling reference toa
dev/requirements.txtthat no longer exists; replaced with auv run --with PyGithub --with rich-click --with richinvocationthat picks up the script's actual dependencies.
Test plan
prek run lychee --all-filespasses (verified locally — 13errors on
main→ 0 errors after this PR).prek run insert-license lint-markdown --all-filesstill passesacross the touched files.
Was generative AI tooling used to co-author this PR?
Generated-by: Claude Code (Opus 4.7) following the guidelines