Skip to content

Exclude non-successful runs from AVERAGE_RUNTIME deadline calculation#68647

Merged
o-nikolas merged 6 commits into
apache:mainfrom
aws-mwaa:bugfix/average-runtime-deadline-success-only
Jun 24, 2026
Merged

Exclude non-successful runs from AVERAGE_RUNTIME deadline calculation#68647
o-nikolas merged 6 commits into
apache:mainfrom
aws-mwaa:bugfix/average-runtime-deadline-success-only

Conversation

@seanghaeli

@seanghaeli seanghaeli commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

For AVERAGE_RUNTIME deadlines, exclude non-success dag runs in the average calculation

@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label Jun 17, 2026
@vincbeck vincbeck added the backport-to-v3-3-test Backport to v3-3-test label Jun 17, 2026
Comment thread airflow-core/src/airflow/serialization/definitions/deadline.py

@ramitkataria ramitkataria left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits but otherwise looks good to me pending CI

Comment thread airflow-core/docs/howto/deadline-alerts.rst Outdated
Comment thread airflow-core/docs/howto/deadline-alerts.rst Outdated

@ferruzzi ferruzzi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks right. Approved pending green CI and resolving Ramit's comments.

Sean Ghaeli added 4 commits June 23, 2026 01:37
DeadlineReference.AVERAGE_RUNTIME computes a deadline from the average duration of
past DAG runs, but the query filtered only on dag_id + start/end-date present — with
no DagRun.state filter. Failed runs (which may have died fast or hung before failing)
were folded into the average, skewing the computed deadline: a fast-failing history
makes it too short (spurious misses), a slow-then-failed history makes it too long
(real slowness never trips it).

Filter the duration query to successful runs only. Add tests asserting failed runs
are excluded from the average and that the deadline is skipped when too few
successful runs exist.
The average and the min_runs threshold count successful runs only, so
"previous Dag runs" / "completed runs" were imprecise.

Generated-by: Claude Code (Opus 4.8) following the guidelines
@seanghaeli seanghaeli force-pushed the bugfix/average-runtime-deadline-success-only branch from 1fb1621 to 64355c4 Compare June 23, 2026 01:37
seanghaeli and others added 2 commits June 22, 2026 20:29
Co-authored-by: Ramit Kataria <ramitkat@amazon.com>
Co-authored-by: Ramit Kataria <ramitkat@amazon.com>
@o-nikolas o-nikolas merged commit 88e6110 into apache:main Jun 24, 2026
77 checks passed
@o-nikolas o-nikolas deleted the bugfix/average-runtime-deadline-success-only branch June 24, 2026 16:57
@github-actions github-actions Bot added this to the Airflow 3.3.1 milestone Jun 24, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Hi maintainer, this PR was merged without a milestone set.
We've automatically set the milestone to Airflow 3.3.1 based on: backport label targeting v3-3-test
If this milestone is not correct, please update it to the appropriate milestone.

This comment was generated by Milestone Tag Assistant.

@github-actions

Copy link
Copy Markdown
Contributor

Backport successfully created: v3-3-test

Note: As of Merging PRs targeted for Airflow 3.X
the committer who merges the PR is responsible for backporting the PRs that are bug fixes (generally speaking) to the maintenance branches.

In matter of doubt please ask in #release-management Slack channel.

Status Branch Result
v3-3-test PR Link

potiuk pushed a commit that referenced this pull request Jun 26, 2026
… calculation (#68647)

DeadlineReference.AVERAGE_RUNTIME computes a deadline from the average duration of
past DAG runs, but the query filtered only on dag_id + start/end-date present — with
no DagRun.state filter. Failed runs (which may have died fast or hung before failing)
were folded into the average, skewing the computed deadline: a fast-failing history
makes it too short (spurious misses), a slow-then-failed history makes it too long
(real slowness never trips it).

Filter the duration query to successful runs only. Add tests asserting failed runs
are excluded from the average and that the deadline is skipped when too few
successful runs exist.

---------
(cherry picked from commit 88e6110)

Co-authored-by: Sean Ghaeli <58916776+seanghaeli@users.noreply.github.com>
Co-authored-by: Sean Ghaeli <ghaeli@amazon.com>
Co-authored-by: Ramit Kataria <ramitkat@amazon.com>
potiuk pushed a commit that referenced this pull request Jun 26, 2026
… calculation (#68647) (#68949)

DeadlineReference.AVERAGE_RUNTIME computes a deadline from the average duration of
past DAG runs, but the query filtered only on dag_id + start/end-date present — with
no DagRun.state filter. Failed runs (which may have died fast or hung before failing)
were folded into the average, skewing the computed deadline: a fast-failing history
makes it too short (spurious misses), a slow-then-failed history makes it too long
(real slowness never trips it).

Filter the duration query to successful runs only. Add tests asserting failed runs
are excluded from the average and that the deadline is skipped when too few
successful runs exist.

---------
(cherry picked from commit 88e6110)

Co-authored-by: Sean Ghaeli <58916776+seanghaeli@users.noreply.github.com>
Co-authored-by: Sean Ghaeli <ghaeli@amazon.com>
Co-authored-by: Ramit Kataria <ramitkat@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:DAG-processing area:deadline-alerts AIP-86 (former AIP-57) backport-to-v3-3-test Backport to v3-3-test ready for maintainer review Set after triaging when all criteria pass.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants