AIP-103: Wiring up task SDK comms and context accessors#66160
AIP-103: Wiring up task SDK comms and context accessors#66160amoghrajesh merged 16 commits intoapache:mainfrom
Conversation
f022c48 to
cb4b54a
Compare
|
@Lee-W I tested it and confirmed |
thanks for checking. revoking the approval for avoding accidental merging
|
Added GET DAG: from airflow.sdk import DAG, task
from airflow.sdk.definitions.asset import Asset, AssetAlias
import pendulum
my_asset = Asset(name="test_asset", uri="s3://bucket/test")
my_alias = AssetAlias("test_alias")
with DAG("alias_producer", schedule=None, start_date=pendulum.datetime(2026, 1, 1)):
@task(outlets=[my_alias, my_asset])
def produce(**context):
context["outlet_events"][my_alias].add(my_asset, extra={"run": "1"})
produce()
with DAG("alias_consumer", schedule=None, start_date=pendulum.datetime(2026, 1, 1)):
@task(inlets=[my_alias])
def consume(**context):
print("=== inlet_events ===")
try:
events = list(context["inlet_events"][my_alias])
print(f"events: {events}")
except Exception as e:
print(f"inlet_events error: {e}")
print("=== asset_state ===")
print("asset_state:", context.get("asset_state"))
try:
state = context["asset_state"][my_asset]
print("current watermark:", state.get("watermark"))
state.set("watermark", "2026-05-06")
print("set watermark to 2026-05-06")
print("read back:", state.get("watermark"))
except Exception as e:
print(f"asset_state error: {e}")
consume()When I do this, this is what I get from the consumer task:
ie: producer emits through alias, consumer accesses |
|
(Hopefully the PR isn't too huge off of scope right now) |
jason810496
left a comment
There was a problem hiding this comment.
Minor nits that can be addressed in follow-up, Non-blocking. Thanks.
|
Ah, I wanna run this through full suite |
|
Thanks for the review folks, merging this one. Will follow up with the next items |

closes: #65779
This PR is part of AIP-103 (Task State Management) and is the third in the series. It adds the Task SDK layer: comms message types, supervisor proxy, and context accessors that wire
context['task_state']andcontext['asset_state']into task execution.What does the PR have?
Comms layer (
comms.py):GetTaskState,SetTaskState,DeleteTaskState,ClearTaskState(withall_map_indices: boolfield)GetAssetStateByName/GetAssetStateByUri,SetAssetStateByName/SetAssetStateByUri,DeleteAssetStateByName/DeleteAssetStateByUri,ClearAssetStateByName/ClearAssetStateByUri— separate typed classes per addressing mode, matching theGetAssetByName/GetAssetByUriconventionGetAssetsByAlias+AssetsByAliasResult— resolves anAssetAliasinlet to its concrete assets at context build timeTaskStateResult,AssetStateResultresult typesSupervisor (
supervisor.py): handler branches proxying the above messages to the Execution API endpoints from PR #66073.Client (
client.py):TaskStateOperationsandAssetStateOperationsclasses exposed asclient.task_stateandclient.asset_state.AssetStateOperationshas a_resolve_endpointhelper that builds theby-name/{op}orby-uri/{op}endpoint + params, keepingget/set/delete/cleareach a one-liner.AssetOperations.get_by_alias()resolves alias → concrete assets.Execution API (
routes/assets.py):GET /assets/by-alias?alias_name=...wrapping the existingexpand_alias_to_assets()DB function. Cadwyn migration added inv2026_04_17.py.Context accessors (
context.py):TaskStateAccessor— always available ascontext['task_state']AssetStateAccessorscontainer +AssetStateAccessor(per-asset):context['asset_state'][MY_ASSET].get('watermark')— keyed byAsset | AssetNameRef | AssetUriRef | AssetAlias, consistent withinlet_events[asset]context['asset_state'].get('watermark')proxies through when exactly one concrete inlet exists; raisesValueErrorwith a clear message for multi-inlet tasksAssetAliasinlets are resolved to their concrete assets at context build time viaGetAssetsByAliascomms —context['asset_state'][Asset(name="a")]works when the alias maps to that asset. If the alias resolves to nothing,asset_stateis present but empty.Context wiring (
task_runner.py): both accessors registered inget_template_context().asset_stateis set for any task with at least one concrete inlet including aliases.Context TypedDict (
definitions/context.py):task_state: TaskStateAccessorandasset_state: AssetStateAccessorsadded.Design choices worth flagging
Asset state routes use
name/urinotasset_id. Asset names and URIs are unique, directly on theAssetobject at runtime, consistent with/assets/by-nameand/assets/by-uri. Avoids a DB round trip.clear()wiping entire fleet is opt-in.DELETE /state/ti/{ti_id}defaults to clearing only this task instance'smap_index. Pass?all_map_indices=true(ortask_state.clear(all_map_indices=True)) for fleet-wide wipe.Both keyed and sugar access on
asset_state.context['asset_state'][MY_ASSET]is the primary API. For single-inlet tasks (the watcher pattern),context['asset_state'].get(...)works as sugar. Consistent withinlet_events[asset].AssetAliasresolution is event-driven. The alias -> concrete asset mapping inexpand_alias_to_assets()is populated when a producer emits through the alias. If the alias has never been emitted through,asset_stateis present but empty.__getattr__not implemented. Template access like{{ task_state.job_id }}is not supported yet. Easy to add later.Test plan
TestTaskStateAccessor,TestAssetStateAccessor,TestAssetStateAccessorsintest_context.pyTestHandleRequestintest_supervisor.py— includesGetAssetsByAliascaseTestTaskStateOperations,TestAssetStateOperations,get_by_aliascases intest_client.pyTestTaskInstanceStateOperationsintest_task_runner.py— covers multi-inlet keyed access,AssetUriRefinlet,AssetAliasinlet resolving to concrete asset, and empty aliasManual verification for the new asset alias endpoint
DAG:
When I do this, this is what I get from the consumer task:
ie: producer emits through alias, consumer accesses
context['asset_state'][my_asset], set/get watermark worksWas generative AI tooling used to co-author this PR?
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.