@@ -108,49 +108,73 @@ date — running breeze the first time below will recreate and fetch it.
108108The skill runs in five phases. Mark tasks with ` TaskCreate ` for each phase
109109and tick them off as you go — the release manager wants to see progress.
110110
111- ### Phase 1 — Discover providers with pending changes
111+ ### Phase 1 — Discover and pre-classify pending changes (deterministic)
112112
113- For each provider, the source of truth for "what changed since last release"
114- is the same git query breeze uses internally: commits between the latest
115- release tag for that provider (` providers-<id>/<version> ` ) and
116- ` apache-https-for-providers/<base-branch> ` , restricted to the provider's own
117- folders.
113+ The source of truth for "what changed since last release" is the same git
114+ query breeze uses internally: commits between the latest release tag for that
115+ provider (` providers-<id>/<version> ` ) and ` apache-https-for-providers/<base-branch> ` ,
116+ restricted to the provider's own folders.
118117
119- Discover in batch by running:
118+ Run the ** deterministic classifier** — it discovers every provider with pending
119+ changes ** and** pre-classifies each commit with hard-coded, high-confidence
120+ rules, flagging only the genuinely ambiguous ones as ` needs_llm ` . No random
121+ answers, nothing to discard:
120122
121123``` bash
122- breeze release-management prepare-provider-documentation \
123- --non-interactive \
124- --skip-changelog \
125- --skip-readme \
126- --release-date " $RELEASE_DATE "
124+ breeze release-management classify-provider-changes \
125+ --base-branch main \
126+ --output-file /tmp/provider-changes.json
127+ # scope to a subset by appending provider ids, e.g. ... amazon cncf.kubernetes
127128```
128129
129- > [ !WARNING]
130- > Do ** not** commit the result of that command. ` --non-interactive ` answers
131- > the classification prompts with random values — Claude will overwrite the
132- > changelog and version bumps in Phase 4 with real classifications. The only
133- > reason to run breeze first is to refresh the apache remote, regenerate
134- > build files, and confirm which providers have pending changes (read the
135- > "Summary of prepared documentation" block at the end).
130+ The JSON it writes:
131+
132+ ``` json
133+ {
134+ "base_branch" : " main" ,
135+ "providers" : {
136+ "amazon" : {
137+ "current_version" : " 9.29.0" ,
138+ "commits" : [
139+ {"hash" : " c2dbd7a75a" , "pr" : " 67987" , "subject" : " Fix IDC domain S3 path resolution" ,
140+ "classification" : " needs_llm" , "reason" : " no high-confidence deterministic rule matched" },
141+ {"hash" : " abc123" , "pr" : " 68087" , "subject" : " Bump the edge-ui-package-updates group ..." ,
142+ "classification" : " misc" , "reason" : " dependency bump (subject starts with 'Bump')" }
143+ ]
144+ }
145+ }
146+ }
147+ ```
136148
137- Record from the summary :
149+ How to read it :
138150
139- - ** Success** — providers that had real changes (these need classification).
140- - ** Docs only** — providers with only documentation changes (already handled
141- by breeze; skip in Phase 2).
142- - ** Skipped on no changes** — nothing to do.
151+ - Providers under ` providers ` have pending changes (these need attention).
152+ - ` classification ∈ {documentation, skip, misc} ` are ** decided by rules — take
153+ them as-is** , no sub-agent needed (doc-only → ` documentation ` , test/example
154+ only → ` skip ` , ` Bump … ` dependency bump → ` misc ` ).
155+ - ` classification == needs_llm ` → ** Phase 3 decides** with a sub-agent. These are
156+ the only commits that need LLM analysis.
157+ - A provider with a ` note ` /` error ` (e.g. a brand-new provider with no prior
158+ release tag) → treat as an ** initial release** and classify by hand.
143159
144- Reset the per-provider files that breeze touched but you'll be rewriting
145- yourself before continuing:
160+ > [ !NOTE]
161+ > The classifier is deliberately conservative: ` Fix … ` /` Add … ` subjects are
162+ > ** not** auto-classified (an "Add …" can be a breaking change), so they come
163+ > back as ` needs_llm ` . The rules live in ` classify_change_deterministically `
164+ > (` dev/breeze/src/airflow_breeze/prepare_providers/provider_documentation.py ` ).
165+
166+ Then regenerate the auto-generated build files (this does ** no** classification,
167+ so nothing random is produced):
146168
147169``` bash
170+ breeze release-management prepare-provider-documentation \
171+ --reapply-templates-only --release-date " $RELEASE_DATE "
148172git checkout -- $( git diff --name-only -- ' **/provider.yaml' ' **/changelog.rst' )
149173```
150174
151175This leaves the regenerated build files (` __init__.py ` , ` README.rst ` ,
152- ` pyproject.toml ` , ` conf.py ` , ` get_provider_info.py ` , ` index.rst ` ) in place
153- and discards only the stuff Claude is about to rewrite.
176+ ` pyproject.toml ` , ` conf.py ` , ` get_provider_info.py ` , ` index.rst ` ) in place and
177+ discards only the changelog/version files Claude is about to rewrite itself .
154178
155179### Phase 2 — Per-provider commit list
156180
@@ -205,32 +229,35 @@ For each commit, classify it into one of:
205229| ` s ` | Skip (test/CI/example only — no user impact) | none |
206230| ` v ` | Min Airflow version bump | minor (treated as misc + bump) |
207231
208- #### Auto-classify cheap cases first
232+ #### Take the deterministic classifications from Phase 1
209233
210- Before spawning a sub-agent, apply the same fast heuristics breeze uses
211- (see ` classify_provider_pr_files ` in
212- ` dev/breeze/src/airflow_breeze/prepare_providers/provider_documentation.py ` ):
234+ ` classify-provider-changes ` (Phase 1) already classified every commit it could
235+ with hard-coded rules. Read ` /tmp/provider-changes.json ` and:
213236
214- - All changed files match ` providers/<id>/docs/**/*.rst ` → ** ` d ` ** (docs).
215- - All changed files match ` providers/<id>/tests/** ` or
216- ` providers/<id>/src/airflow/providers/<id>/example_dags/** ` → ** ` s ` ** (skip).
217- - Subject contains ` Bump minimum Airflow version ` and only ` __init__.py ` /
218- ` provider.yaml ` changed → ** ` v ` ** .
237+ - Use any commit whose ` classification ` is ` documentation ` , ` skip ` , or ` misc `
238+ ** as-is** — these map to ` d ` , ` s ` , ` m ` respectively; no sub-agent needed.
239+ - Only commits with ` classification: needs_llm ` go to a sub-agent (below).
219240
220- Note these classifications and move on — no sub-agent needed.
241+ The deterministic rules (doc-only → ` d ` , test/example-only → ` s ` , ` Bump … `
242+ dependency bump → ` m ` ) are exactly the cheap cases — now computed once by
243+ breeze (` classify_change_deterministically ` ) instead of re-derived here. If you
244+ ever need the min-Airflow-bump case (` v ` ), that one is still a ` needs_llm `
245+ judgement: a sub-agent should flag it when a PR bumps the provider's minimum
246+ Airflow version.
221247
222- #### Classify the rest — batched per provider, not one agent per PR
248+ #### Classify the ` needs_llm ` commits — batched per provider, not one agent per PR
223249
250+ Only the commits the classifier returned as ` needs_llm ` still need a sub-agent.
224251Classification is the token-heavy part of this skill, so spend sub-agents
225252sparingly. Do ** not** spawn one sub-agent per PR — that is one agent per
226253commit and balloons to hundreds of agents on a normal release wave. Pick the
227254smallest fan-out that fits the volume:
228255
229- - ** Few commits remain (≲ 15 across all providers) → classify inline.** Read
230- each PR and its provider-scoped diff yourself, in this context. Spawn no
256+ - ** Few ` needs_llm ` commits remain (≲ 15 across all providers) → classify inline.**
257+ Read each PR and its provider-scoped diff yourself, in this context. Spawn no
231258 sub-agents at all.
232259- ** More than that → one sub-agent per provider.** Each agent classifies that
233- provider's * entire* remaining commit list in a single pass. This is the
260+ provider's * entire* remaining ` needs_llm ` list in a single pass. This is the
234261 natural unit: multi-provider PRs are classified independently per provider
235262 anyway (see Cross-Cutting Rules), and one provider-scoped agent amortizes
236263 the breaking-change checklist across all of that provider's commits instead
0 commit comments