Skip to content

Strengthen trigger-hash tests with a fully-serialized JSON guard and broader fixtures #66413

@jason810496

Description

@jason810496

Follow-up from PR #66002 (review thread).

Background

airflow-core/tests/unit/serialization/test_encoders.py has a TestTriggerHashConsistency suite that asserts the DAG-side hash (BaseEventTrigger.hash(classpath, encode_trigger(trigger)["kwargs"])) matches the DB-side hash after a real DB round-trip through encrypt_kwargs / _decrypt_kwargs. This mirrors the comparison done in AssetModelOperation.add_asset_trigger_references (airflow-core/src/airflow/dag_processing/collection.py) — if the two hashes diverge, the scheduler keeps recreating trigger rows on every heartbeat.

Gap 1 — symmetric-but-wrong serialization is not caught

The current hash assertion only catches asymmetric breakage between the two sides. If encode_trigger ever produced kwargs that contained non-JSON-safe Python objects but both sides happened to serialize them identically (e.g. via repr() or pickle-flavoured fallback), the hash check would still pass while violating the actual contract — that encoded kwargs are fully primitive / JSON-serializable, which is what Trigger.encrypt_kwargs (serde.serialize → json.dumps → fernet) requires.

Gap 2 — _TRIGGER_PARAMS coverage is too narrow

_TRIGGER_PARAMS today only exercises a handful of shapes:

  • FileDeleteTrigger(filepath, poke_interval) — primitive-only
  • _CallableKwargsTrigger(topics=()) — empty tuple
  • _CallableKwargsTrigger(topics=("fizz_buzz",), poll_timeout=1.0) — single-element tuple
  • _CallableKwargsTrigger(topics=[...], apply_function=..., apply_function_args=[...], apply_function_kwargs={...}, poll_timeout=3) — mixed list/dict/string

This leaves several realistic trigger-construction shapes unverified. We should expand fixtures to cover at least:

  • datetime / timedelta kwargs (the canonical DateTimeTrigger / TimeDeltaTrigger shapes).
  • Nested containers (list[dict], dict[str, list], tuple[tuple, ...]).
  • set / frozenset kwargs (these go through their own serde path).
  • None and falsy primitive values (0, "", False).
  • Enum values and dataclass / attrs instances if any trigger constructor accepts them.
  • Path-like (pathlib.Path) values.

The intent is that any trigger shape a real provider could plausibly construct should be in _TRIGGER_PARAMS, so the JSON-guard from Gap 1 is exercised broadly.

Proposed change (test-only)

In airflow-core/tests/unit/serialization/test_encoders.py:

  1. Add a small _assert_fully_serialized(encoded_kwargs) helper that runs json.dumps(encoded_kwargs) and asserts it succeeds (and, if useful, that the structure stays under the existing wrapper-depth invariant — the < guard discussed in the review thread).
  2. Call it from each parametrized test in TestEncodeTrigger and TestTriggerHashConsistency driven by _TRIGGER_PARAMS, so every fixture is checked.
  3. Extend _TRIGGER_PARAMS with additional fixtures covering the shapes listed in Gap 2.
  4. No production-code changes are expected — this is a test-strengthening change only.

Acceptance criteria

  • All _TRIGGER_PARAMS cases pass through json.dumps on the encoded kwargs without raising.
  • The new guard fires (i.e. fails) if a regression reintroduces non-JSON-safe values into encode_trigger output.
  • _TRIGGER_PARAMS is broadened to cover datetime/timedelta, nested containers, set/frozenset, falsy primitives, and any other realistic trigger-construction shapes.
  • No change to airflow-core/src/; the PR touches only airflow-core/tests/unit/serialization/test_encoders.py (and possibly a tiny shared helper if reused elsewhere).

Out of scope


Drafted-by: Claude Code (Opus 4.7); reviewed by @jason810496 before posting

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions