Skip to content

Commit fdb836f

Browse files
Merge branch 'main' into fix/stackdriver-guard-transport-send
2 parents 8c7b68c + 2746dcb commit fdb836f

18 files changed

Lines changed: 723 additions & 212 deletions

File tree

.agents/skills/prepare-providers-documentation/SKILL.md

Lines changed: 69 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -108,49 +108,73 @@ date — running breeze the first time below will recreate and fetch it.
108108
The skill runs in five phases. Mark tasks with `TaskCreate` for each phase
109109
and tick them off as you go — the release manager wants to see progress.
110110

111-
### Phase 1 — Discover providers with pending changes
111+
### Phase 1 — Discover and pre-classify pending changes (deterministic)
112112

113-
For each provider, the source of truth for "what changed since last release"
114-
is the same git query breeze uses internally: commits between the latest
115-
release tag for that provider (`providers-<id>/<version>`) and
116-
`apache-https-for-providers/<base-branch>`, restricted to the provider's own
117-
folders.
113+
The source of truth for "what changed since last release" is the same git
114+
query breeze uses internally: commits between the latest release tag for that
115+
provider (`providers-<id>/<version>`) and `apache-https-for-providers/<base-branch>`,
116+
restricted to the provider's own folders.
118117

119-
Discover in batch by running:
118+
Run the **deterministic classifier** — it discovers every provider with pending
119+
changes **and** pre-classifies each commit with hard-coded, high-confidence
120+
rules, flagging only the genuinely ambiguous ones as `needs_llm`. No random
121+
answers, nothing to discard:
120122

121123
```bash
122-
breeze release-management prepare-provider-documentation \
123-
--non-interactive \
124-
--skip-changelog \
125-
--skip-readme \
126-
--release-date "$RELEASE_DATE"
124+
breeze release-management classify-provider-changes \
125+
--base-branch main \
126+
--output-file /tmp/provider-changes.json
127+
# scope to a subset by appending provider ids, e.g. ... amazon cncf.kubernetes
127128
```
128129

129-
> [!WARNING]
130-
> Do **not** commit the result of that command. `--non-interactive` answers
131-
> the classification prompts with random values — Claude will overwrite the
132-
> changelog and version bumps in Phase 4 with real classifications. The only
133-
> reason to run breeze first is to refresh the apache remote, regenerate
134-
> build files, and confirm which providers have pending changes (read the
135-
> "Summary of prepared documentation" block at the end).
130+
The JSON it writes:
131+
132+
```json
133+
{
134+
"base_branch": "main",
135+
"providers": {
136+
"amazon": {
137+
"current_version": "9.29.0",
138+
"commits": [
139+
{"hash": "c2dbd7a75a", "pr": "67987", "subject": "Fix IDC domain S3 path resolution",
140+
"classification": "needs_llm", "reason": "no high-confidence deterministic rule matched"},
141+
{"hash": "abc123", "pr": "68087", "subject": "Bump the edge-ui-package-updates group ...",
142+
"classification": "misc", "reason": "dependency bump (subject starts with 'Bump')"}
143+
]
144+
}
145+
}
146+
}
147+
```
136148

137-
Record from the summary:
149+
How to read it:
138150

139-
- **Success** — providers that had real changes (these need classification).
140-
- **Docs only** — providers with only documentation changes (already handled
141-
by breeze; skip in Phase 2).
142-
- **Skipped on no changes** — nothing to do.
151+
- Providers under `providers` have pending changes (these need attention).
152+
- `classification ∈ {documentation, skip, misc}` are **decided by rules — take
153+
them as-is**, no sub-agent needed (doc-only → `documentation`, test/example
154+
only → `skip`, `Bump …` dependency bump → `misc`).
155+
- `classification == needs_llm`**Phase 3 decides** with a sub-agent. These are
156+
the only commits that need LLM analysis.
157+
- A provider with a `note`/`error` (e.g. a brand-new provider with no prior
158+
release tag) → treat as an **initial release** and classify by hand.
143159

144-
Reset the per-provider files that breeze touched but you'll be rewriting
145-
yourself before continuing:
160+
> [!NOTE]
161+
> The classifier is deliberately conservative: `Fix …`/`Add …` subjects are
162+
> **not** auto-classified (an "Add …" can be a breaking change), so they come
163+
> back as `needs_llm`. The rules live in `classify_change_deterministically`
164+
> (`dev/breeze/src/airflow_breeze/prepare_providers/provider_documentation.py`).
165+
166+
Then regenerate the auto-generated build files (this does **no** classification,
167+
so nothing random is produced):
146168

147169
```bash
170+
breeze release-management prepare-provider-documentation \
171+
--reapply-templates-only --release-date "$RELEASE_DATE"
148172
git checkout -- $(git diff --name-only -- '**/provider.yaml' '**/changelog.rst')
149173
```
150174

151175
This leaves the regenerated build files (`__init__.py`, `README.rst`,
152-
`pyproject.toml`, `conf.py`, `get_provider_info.py`, `index.rst`) in place
153-
and discards only the stuff Claude is about to rewrite.
176+
`pyproject.toml`, `conf.py`, `get_provider_info.py`, `index.rst`) in place and
177+
discards only the changelog/version files Claude is about to rewrite itself.
154178

155179
### Phase 2 — Per-provider commit list
156180

@@ -205,32 +229,35 @@ For each commit, classify it into one of:
205229
| `s` | Skip (test/CI/example only — no user impact) | none |
206230
| `v` | Min Airflow version bump | minor (treated as misc + bump) |
207231

208-
#### Auto-classify cheap cases first
232+
#### Take the deterministic classifications from Phase 1
209233

210-
Before spawning a sub-agent, apply the same fast heuristics breeze uses
211-
(see `classify_provider_pr_files` in
212-
`dev/breeze/src/airflow_breeze/prepare_providers/provider_documentation.py`):
234+
`classify-provider-changes` (Phase 1) already classified every commit it could
235+
with hard-coded rules. Read `/tmp/provider-changes.json` and:
213236

214-
- All changed files match `providers/<id>/docs/**/*.rst`**`d`** (docs).
215-
- All changed files match `providers/<id>/tests/**` or
216-
`providers/<id>/src/airflow/providers/<id>/example_dags/**`**`s`** (skip).
217-
- Subject contains `Bump minimum Airflow version` and only `__init__.py` /
218-
`provider.yaml` changed → **`v`**.
237+
- Use any commit whose `classification` is `documentation`, `skip`, or `misc`
238+
**as-is** — these map to `d`, `s`, `m` respectively; no sub-agent needed.
239+
- Only commits with `classification: needs_llm` go to a sub-agent (below).
219240

220-
Note these classifications and move on — no sub-agent needed.
241+
The deterministic rules (doc-only → `d`, test/example-only → `s`, `Bump …`
242+
dependency bump → `m`) are exactly the cheap cases — now computed once by
243+
breeze (`classify_change_deterministically`) instead of re-derived here. If you
244+
ever need the min-Airflow-bump case (`v`), that one is still a `needs_llm`
245+
judgement: a sub-agent should flag it when a PR bumps the provider's minimum
246+
Airflow version.
221247

222-
#### Classify the rest — batched per provider, not one agent per PR
248+
#### Classify the `needs_llm` commits — batched per provider, not one agent per PR
223249

250+
Only the commits the classifier returned as `needs_llm` still need a sub-agent.
224251
Classification is the token-heavy part of this skill, so spend sub-agents
225252
sparingly. Do **not** spawn one sub-agent per PR — that is one agent per
226253
commit and balloons to hundreds of agents on a normal release wave. Pick the
227254
smallest fan-out that fits the volume:
228255

229-
- **Few commits remain (≲ 15 across all providers) → classify inline.** Read
230-
each PR and its provider-scoped diff yourself, in this context. Spawn no
256+
- **Few `needs_llm` commits remain (≲ 15 across all providers) → classify inline.**
257+
Read each PR and its provider-scoped diff yourself, in this context. Spawn no
231258
sub-agents at all.
232259
- **More than that → one sub-agent per provider.** Each agent classifies that
233-
provider's *entire* remaining commit list in a single pass. This is the
260+
provider's *entire* remaining `needs_llm` list in a single pass. This is the
234261
natural unit: multi-provider PRs are classified independently per provider
235262
anyway (see Cross-Cutting Rules), and one provider-scoped agent amortizes
236263
the breaking-change checklist across all of that provider's commits instead

dev/breeze/doc/09_release_management_tasks.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -447,6 +447,25 @@ You can also add ``--answer yes`` to perform non-interactive build.
447447
:width: 100%
448448
:alt: Breeze prepare-provider-documentation
449449

450+
Classifying provider changes
451+
""""""""""""""""""""""""""""
452+
453+
You can use Breeze to classify each provider's unreleased changes using hard-coded,
454+
high-confidence rules, flagging ambiguous commits as ``needs_llm`` for an agent or skill
455+
to assess. The result is emitted as JSON, providing a deterministic alternative to the
456+
``--non-interactive`` documentation run used purely for change discovery.
457+
458+
The below example classifies the pending changes for all providers.
459+
460+
.. code-block:: bash
461+
462+
breeze release-management classify-provider-changes
463+
464+
.. image:: ./images/output_release-management_classify-provider-changes.svg
465+
:target: https://raw.githubusercontent.com/apache/airflow/main/dev/breeze/doc/images/output_release-management_classify-provider-changes.svg
466+
:width: 100%
467+
:alt: Breeze classify-provider-changes
468+
450469
Updating provider next version
451470
""""""""""""""""""""""""""""""
452471

0 commit comments

Comments
 (0)