diff --git a/.github/skills/airflow-translations/locales/th.md b/.github/skills/airflow-translations/locales/th.md index b67e4d01d615f..9cd5658648ee9 100644 --- a/.github/skills/airflow-translations/locales/th.md +++ b/.github/skills/airflow-translations/locales/th.md @@ -13,8 +13,8 @@ The following technical terms should **remain in English** in Thai translations: ### Core Technical Terms (คำศัพท์ทางเทคนิค) -- **DAG** (Directed Acyclic Graph) - Keep as "DAG" -- **DAG Run** - Keep as "DAG Run" +- **Dag** - Keep as "Dag" (Airflow convention; never write "DAG") +- **Dag Run** - Keep as "Dag Run" - **Task Instance** - Keep as "Task Instance" - **XCom** - Keep as "XCom" - **Asset** - Keep as "Asset" @@ -24,7 +24,7 @@ The following technical terms should **remain in English** in Thai translations: - **Sensor** - Keep as "Sensor" - **Hook** - Keep as "Hook" - **Operator** - Keep as "Operator" (โอเปอเรเตอร์) or in English -- **DAGBag** - Keep as "DAGBag" +- **DagBag** - Keep as "DagBag" ### UI Components (ส่วนประกอบของอินเทอร์เฟซ) @@ -123,14 +123,14 @@ In Airflow UI and messages, numerals are typically formatted as: Example: ```text -งาน DAG รันสำเร็จ (DAG run successful) +งาน Dag รันสำเร็จ (Dag run successful) ``` ## Translation Style Guidelines ### 1. Technical Terminology -Keep technical terms like DAG, XCom, Operator in English when: +Keep technical terms like Dag, XCom, Operator in English when: - They appear in code or configuration examples - No clear Thai equivalent exists @@ -181,8 +181,8 @@ Prefer translation for: ### 1. "Run" Context -- "Run DAG" → "รัน DAG" or "ดำเนินการ DAG" -- "DAG run" (noun) → "การรัน DAG" or "DAG Run" +- "Run Dag" → "รัน Dag" or "ดำเนินการ Dag" +- "Dag run" (noun) → "การรัน Dag" or "Dag Run" - "Run ID" → "รันไอดี" or "Run ID" ### 2. "Task" Context @@ -191,11 +191,11 @@ Prefer translation for: - "Task instance" → "Task Instance" or "อินสแตนซ์งาน" - "Task ID" → "Task ID" or "ไอดีงาน" -### 3. "DAG" Context +### 3. "Dag" Context -- "DAG run" → "การรัน DAG" or "DAG Run" -- "DAG ID" → "DAG ID" or "ไอดี DAG" -- "Sub DAG" → "DAG ย่อย" or "Sub DAG" +- "Dag run" → "การรัน Dag" or "Dag Run" +- "Dag ID" → "Dag ID" or "ไอดี Dag" +- "Sub Dag" → "Dag ย่อย" or "Sub Dag" ### 4. Configuration @@ -252,14 +252,14 @@ However, these are typically omitted in technical documentation to maintain conc "Tree View" → "มุมมองต้นไม้" or "Tree View" "Graph View" → "มุมมองกราฟ" or "Graph View" "Task Instances" → "Task Instances" or "อินสแตนซ์งาน" -"DAG Runs" → "DAG Runs" or "การรัน DAG" +"Dag Runs" → "Dag Runs" or "การรัน Dag" ``` ### Message Examples ```text "Task failed" → "งานล้มเหลว" -"DAG run successful" → "การรัน DAG สำเร็จ" +"Dag run successful" → "การรัน Dag สำเร็จ" "XCom pushed" → "ดัน XCom แล้ว" or "XCom pushed" "Connection test failed" → "การทดสอบการเชื่อมต่อล้มเหลว" ``` diff --git a/AGENTS.md b/AGENTS.md index 13add2ead486d..594995304172e 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -3,6 +3,21 @@ # AGENTS instructions +## Naming + +Write **Dag** (title case) in all prose. Keep the all-caps or lowercase +spelling only when reproducing a literal code token — never rewrite these, +even inside fenced code blocks: + +- Python: the SDK class `DAG` (`from airflow.sdk import DAG`, + `dag = DAG("my_dag", ...)`); identifiers like `dag_id`, `dag`, `my_dag`. +- CLI: `airflow dags list`, `airflow dags test`, etc. +- Paths and config keys: `dag_processing/`, `dagprocessor`, `get_dag`, etc. +- Anti-pattern quotes that show the wrong form to teach the rule itself + (e.g., `Use "DAG" — always write "Dag"`). + +Don't spell out **Directed Acyclic Graph** except for historical context. + ## Environment Setup - Install prek: `uv tool install prek` diff --git a/providers/AGENTS.md b/providers/AGENTS.md index 71278fdde78e4..2e0e3d90f95f3 100644 --- a/providers/AGENTS.md +++ b/providers/AGENTS.md @@ -17,3 +17,91 @@ Each provider is an independent package with its own `pyproject.toml`, tests, an - Don't upper-bound dependencies by default; add limits only with justification. - Tests live alongside the provider — mirror source paths in test directories. - Full guide: [`contributing-docs/12_provider_distributions.rst`](../contributing-docs/12_provider_distributions.rst) + +## Changelog — never use newsfragments + +**Never create newsfragments for providers.** Providers are released from `main` in waves, so +per-PR newsfragments are not consumed by the release process — the release manager regenerates the +changelog from `git log`. The towncrier-managed `newsfragments/` workflow is used only by +`airflow-core/`, `chart/`, and `dev/mypy/`. (`airflow-ctl/` follows the same "no newsfragments, +direct edit" pattern as providers — see `dev/README_RELEASE_AIRFLOWCTL.md`.) + +When a provider change needs a user-visible note (typically a breaking change or important behavior +change that warrants explanation), update the provider's `docs/changelog.rst` directly in the same +PR, just below the `Changelog` header — exactly as the in-file `NOTE TO CONTRIBUTORS` block +describes. Routine entries (features, bug fixes, misc) are collected automatically by the release +manager from commit messages, so most PRs do not need to touch the changelog at all. + +## Security: Connection extras must not be forwarded blindly to hooks/operators + +**Never pass the whole `Connection.extra` dict (or `**conn.extra_dejson`) as +keyword arguments to a hook, operator, client constructor, or any underlying +library call.** Forward only the specific extra keys you have explicitly +reviewed and know are safe to expose. + +### Why this matters — different security boundaries + +Airflow has two distinct user roles with different trust levels: + +- **Connection editors** — UI/API users with permission to create or edit + Connections. They control the host, login, password, and the `extra` JSON + blob. +- **Dag authors** — users who write the Python code that constructs hooks + and operators. They control which arguments are passed at call sites. + +These are *not* the same population, and the security model treats them +differently. A Connection editor is trusted to supply credentials for a target +system; they are **not** trusted to alter how the worker process behaves, load +arbitrary Python code, change file paths the worker reads, or pass options +into client libraries that the Dag author did not opt into. + +### What goes wrong when extras are forwarded blindly + +Many client libraries accept constructor or method kwargs that are dangerous +when attacker-controlled. Concrete examples seen in the wild: + +- A kwarg that names a Python callable, plugin path, or import string — + attacker sets it to a module they control → **remote code execution** on + the worker. +- A kwarg that points at a local file (cert path, key file, log path, + config file) — attacker redirects it to read or overwrite worker-side + files. +- A kwarg that sets a proxy, endpoint URL, or hostname — attacker + redirects traffic to an MITM endpoint and harvests credentials or + tokens for *other* systems. +- A kwarg that toggles TLS verification, signing, or auth — attacker + silently downgrades security. +- A kwarg that controls subprocess execution, shell invocation, or + template rendering — attacker reaches command/template injection. + +In all of these, the Connection editor effectively gains capabilities +(RCE, file read/write, traffic redirection, auth bypass) that the security +model does not grant them. + +### The rule + +- **Allowlist, never passthrough.** In the hook, read each extra key by + name (`conn.extra_dejson.get("region_name")`, + `conn.extra_dejson.get("verify")`, …) and pass only those named values + forward. Reject or ignore unknown keys. +- **Do not** write `SomeClient(**conn.extra_dejson)`, + `hook = MyHook(**conn.extra_dejson)`, or + `kwargs.update(conn.extra_dejson)` followed by a downstream call. +- When adding support for a new extra key, treat it like any other public + argument: review what the underlying library does with it, and document + it in the provider's connection docs. +- If a Dag author genuinely needs to pass a non-allowlisted option, that + option should be a **Dag-author-supplied argument** on the operator or + hook (with its own review), not something a Connection editor can set. + +### When reviewing provider PRs + +Flag any of these patterns: + +- `**conn.extra_dejson` or `**self.extra_dejson` spread into a constructor + or call. +- Looping over `conn.extra_dejson.items()` and forwarding every key. +- New code paths where extras are merged into `kwargs` and then passed on + unfiltered. +- "Convenience" features that let users put arbitrary client kwargs into + `extra` — they widen the Connection-editor blast radius.