Skip to content

Do not deserialize trigger_kwargs when loading serialized DAGs#66002

Merged
amoghrajesh merged 10 commits intoapache:mainfrom
astronomer:dont-deser-triggers-on-scheduler-and-api-server
May 7, 2026
Merged

Do not deserialize trigger_kwargs when loading serialized DAGs#66002
amoghrajesh merged 10 commits intoapache:mainfrom
astronomer:dont-deser-triggers-on-scheduler-and-api-server

Conversation

@amoghrajesh
Copy link
Copy Markdown
Contributor

@amoghrajesh amoghrajesh commented Apr 28, 2026


Was generative AI tooling used to co-author this PR?
  • Yes : claude sonnet 4.5

After #55068 was merged, inconsequentially, _decode_start_trigger_args was deserializing trigger_kwargs and next_kwargs when loading a serialized DAG. These fields hold the serialized trigger state that only the Triggerer needs — when it picks up a deferred task, inflates the kwargs, and instantiates the trigger class.

The Scheduler and API Server load serialized DAGs but never touch these values; deserializing them there is wasted work.

This PR removes the deserialization and keeps trigger_kwargs and next_kwargs as raw JSON on StartTriggerArgs. The Triggerer reads trigger kwargs directly from the Trigger DB row (not through DAG deserialization), so its path is unaffected: airflow-core/src/airflow/models/trigger.py#L170-L179

What changes?

Why the changes to enum.py

#66002 (comment)

Scheduler

Before:
DagSerialization.from_dict()
  → _decode_start_trigger_args()
  → BaseSerialization.deserialize(trigger_kwargs)   # inflate to Python objects
  → StartTriggerArgs.trigger_kwargs = {delta: timedelta(...)}
TI.schedule_tis()
  → Trigger(kwargs=trigger_kwargs)
  → encrypt_kwargs() → serde.serialize() → json → fernet → encrypted_kwargs (DB)

After:

DagSerialization.from_dict()
  → _decode_start_trigger_args()
  → StartTriggerArgs.trigger_kwargs = {"__type": "dict", "__var": {...}}  # raw JSON, no work done
TI.schedule_tis()
  → Trigger(kwargs=trigger_kwargs)
  → encrypt_kwargs() → serde.serialize() → json → fernet → encrypted_kwargs (DB)

The Trigger.__init__ calls serde.serialize() on whatever it receives, so deserializing then re-serializing was pure waste.

API Server

Before: loads serialized DAG → _decode_start_trigger_args() which deserializes trigger_kwargs into Python objects → never used, thrown away.

After: same but stays raw JSON. No functional difference, just skips the work.

Triggerer

Before and after:

The triggerer also loads the serialized DAG at https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/jobs/triggerer_job_runner.py#L717-L735 to check start_from_trigger. So it does go through _decode_start_trigger_args() too, but it does not use trigger_kwargs from that — it reads trigger.encrypted_kwargs from the DB row directly below. So the fix is safe for the triggerer as well.


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Apr 28, 2026

Just one test left!

Copy link
Copy Markdown
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple, straightforward.

@amoghrajesh
Copy link
Copy Markdown
Contributor Author

The changes to the Encoding enum are due to this: https://github.com/apache/airflow/actions/runs/25040217658/job/73343984827?pr=66002

_decode_start_trigger_args previously called BaseSerialization.deserialize() on trigger_kwargs when loading a serialized DAG, which inflated the BaseSerialization-encoded dict back to Python objects before storing on StartTriggerArgs. This PR stops that deserialization and keeps trigger_kwargs as raw JSON.

The raw JSON has Encoding enum instances as dict keys — that's how BaseSerialization.serialize() encodes dicts. When defer_task() passes trigger_kwargs to Trigger(kwargs=...), encrypt_kwargs calls serde.serialize(), which converts dict keys via str(k).

Python 3.10+

str(Encoding.TYPE)
Out[3]: '__type'

Python 3.10:

Python 3.10.19 (main, Feb 12 2026, 00:36:33) [Clang 21.1.4 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from enum import Enum, unique
>>> @unique
... class Encoding(str, Enum):
...     TYPE = "__type"
...     VAR = "__var"
...
>>> str(Encoding.TYPE)
'Encoding.TYPE'

On Python ≤3.10, str(Encoding.TYPE) returns "Encoding.TYPE" (the enum repr) instead of "__type" (its value), mangling the keys so the Triggerer cannot read them back.

Adding __str__ to Encoding makes str(Encoding.TYPE) return "__type" consistently across all Python versions, so serde's key conversion produces the right output. This is a pre-existing inconsistency that only became observable once we stopped deserializing trigger_kwargs early.

@amoghrajesh amoghrajesh self-assigned this Apr 28, 2026
@amoghrajesh amoghrajesh requested a review from jason810496 April 30, 2026 07:26
@amoghrajesh
Copy link
Copy Markdown
Contributor Author

The changes to the Encoding enum are due to this: https://github.com/apache/airflow/actions/runs/25040217658/job/73343984827?pr=66002

_decode_start_trigger_args previously called BaseSerialization.deserialize() on trigger_kwargs when loading a serialized DAG, which inflated the BaseSerialization-encoded dict back to Python objects before storing on StartTriggerArgs. This PR stops that deserialization and keeps trigger_kwargs as raw JSON.

The raw JSON has Encoding enum instances as dict keys — that's how BaseSerialization.serialize() encodes dicts. When defer_task() passes trigger_kwargs to Trigger(kwargs=...), encrypt_kwargs calls serde.serialize(), which converts dict keys via str(k).

Python 3.10+

str(Encoding.TYPE)
Out[3]: '__type'

Python 3.10:

Python 3.10.19 (main, Feb 12 2026, 00:36:33) [Clang 21.1.4 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from enum import Enum, unique
>>> @unique
... class Encoding(str, Enum):
...     TYPE = "__type"
...     VAR = "__var"
...
>>> str(Encoding.TYPE)
'Encoding.TYPE'

On Python ≤3.10, str(Encoding.TYPE) returns "Encoding.TYPE" (the enum repr) instead of "__type" (its value), mangling the keys so the Triggerer cannot read them back.

Adding __str__ to Encoding makes str(Encoding.TYPE) return "__type" consistently across all Python versions, so serde's key conversion produces the right output. This is a pre-existing inconsistency that only became observable once we stopped deserializing trigger_kwargs early.

Caused some side effects, writing a manual normalising code in defer_task instead

@amoghrajesh amoghrajesh requested a review from XD-DENG as a code owner April 30, 2026 07:27
@amoghrajesh
Copy link
Copy Markdown
Contributor Author

@potiuk some things changed since you reviewed, do you wanna take another look?

@amoghrajesh amoghrajesh requested a review from potiuk April 30, 2026 07:28
Copy link
Copy Markdown
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is currently red on the Python 3.10 Core...Serialization jobs (Sqlite / MySQL / Postgres / LowestDeps). The failures are not just the test files modified by this PR — airflow-core/tests/unit/triggers/test_base_trigger.py::test_render_template_fields and test_render_template_fields_filters_to_trigger_kwargs are also failing with assert 'name' in () and a ValueError('not enough values to unpack (expected 2, got 1)') raised in repr(). Those tests aren't touched by this PR, which means the new shape of trigger_kwargs is leaking past defer_task into other consumers.

The premise of the change is good (skipping the deserialize-then-reserialize cycle is a real win), but the _normalize workaround in defer_task is too narrow — it only protects one path while leaving every other consumer of start_trigger_args.trigger_kwargs to discover the encoded form on its own. Two suggestions:

  1. Move the normalization to the serde boundary, e.g. inside Trigger.encrypt_kwargs, so every consumer benefits and we don't need a workaround in defer_task at all.
  2. Or revisit the originally-attempted Encoding.__str__ fix — fixing str(Encoding.TYPE) == "__type" on 3.10 would make this whole class of issues go away. The thread mentioned "side effects"; would be useful to see them written down so we can decide whether to address them directly.

Drafted-by: Claude Opus 4.7; reviewed by @potiuk before posting

Comment thread airflow-core/src/airflow/models/taskinstance.py Outdated
Comment thread airflow-core/src/airflow/models/taskinstance.py Outdated
Comment thread airflow-core/src/airflow/models/taskinstance.py Outdated
Comment thread airflow-core/tests/unit/serialization/test_dag_serialization.py
Comment thread airflow-core/tests/unit/serialization/test_dag_serialization.py
@amoghrajesh
Copy link
Copy Markdown
Contributor Author

Ok, I am taking a look at this one now

@amoghrajesh amoghrajesh requested review from dabla and uranusjr May 4, 2026 12:21
@amoghrajesh amoghrajesh added the full tests needed We need to run full set of tests for this PR to merge label May 4, 2026
@amoghrajesh
Copy link
Copy Markdown
Contributor Author

@dabla would appreciate your review here too

Copy link
Copy Markdown
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for the fix, and it LGTM overall.

Wondering do you have any idea regarding avoid the similar error like this?
It's similar to last 3.2.0 release blocker that cause by some of the nested field it not serialize well for BaseEventTrigger

Perhaps checking if there's any field not serialize properly (e.g. regex with <XXX object at 0x0000...> format) with different case of triggers? (I don't have any concrete thought now, just thinking about how to catch this kind of error in the first place)

@amoghrajesh
Copy link
Copy Markdown
Contributor Author

@jason810496 Good catch on that gap, it seems that the hash test only catches asymmetric breakage, not the case where both sides serialize equally wrong. Tracking adding a _assert_fully_serialized check (json.dumps + < guard) over _TRIGGER_PARAMS as a follow-up; it's independent of this PR's scope and I do not want to pollute this one.

@jason810496
Copy link
Copy Markdown
Member

jason810496 commented May 5, 2026

@jason810496 Good catch on that gap, it seems that the hash test only catches asymmetric breakage, not the case where both sides serialize equally wrong. Tracking adding a _assert_fully_serialized check (json.dumps + < guard) over _TRIGGER_PARAMS as a follow-up; it's independent of this PR's scope and I do not want to pollute this one.

Yeah, I tracked in #66413 for the follow-up, anyone from the community can give a shot.

Comment thread airflow-core/src/airflow/models/taskinstance.py Outdated
Comment thread airflow-core/src/airflow/serialization/enums.py Outdated
Comment thread airflow-core/tests/unit/models/test_dagrun.py
@amoghrajesh amoghrajesh requested a review from ephraimbuddy May 6, 2026 10:18
@amoghrajesh
Copy link
Copy Markdown
Contributor Author

@ephraimbuddy can you take another look?

@amoghrajesh amoghrajesh force-pushed the dont-deser-triggers-on-scheduler-and-api-server branch from 03a2817 to a8eb624 Compare May 7, 2026 04:22
@amoghrajesh amoghrajesh merged commit 69b6c54 into apache:main May 7, 2026
142 checks passed
@amoghrajesh amoghrajesh deleted the dont-deser-triggers-on-scheduler-and-api-server branch May 7, 2026 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:DAG-processing full tests needed We need to run full set of tests for this PR to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants