Thread version_data through BundleInfo to worker-side bundle initialization#67217
Open
o-nikolas wants to merge 8 commits into
Open
Thread version_data through BundleInfo to worker-side bundle initialization#67217o-nikolas wants to merge 8 commits into
o-nikolas wants to merge 8 commits into
Conversation
ferruzzi
approved these changes
May 20, 2026
Contributor
ferruzzi
left a comment
There was a problem hiding this comment.
Do we want any unit tests? It's just plumbing a value through, so maybe not necessary. Feels pretty trivial on this PR so I'll approve but felt odd not calling it out.
…zation Add version_data to the push path so structured bundle metadata (e.g., S3 manifests) reaches workers at task execution time. Changes: - Add version_data field to BundleInfo (workloads/base.py) - Populate version_data from DagVersion in ExecuteTask.make() - Add selectinload(TI.dag_version) to scheduler enqueue query to avoid N+1 queries when reading version_data - Add version_data parameter to BaseDagBundle.__init__ (stored as self.version_data) and DagBundlesManager.get_bundle() - Pass version_data through task_runner.py and callback_supervisor.py - Regenerate task-sdk datamodels to include version_data in BundleInfo Existing bundles ignore version_data (defaults to None). The S3 bundle will use self.version_data in initialize() to fetch specific object versions (follow-up PR).
Address review feedback: - Use dict[str, Any] | None instead of bare dict | None for version_data in both BaseDagBundle.__init__ and BundleInfo - Add minimal tests verifying version_data plumbing through the bundle constructor
97dc507 to
48a9b1b
Compare
vincbeck
approved these changes
May 20, 2026
…nc signature The mypy providers check failed because execute_async declared key as 'TaskInstanceKey | str' but BatchQueuedJob.key expects BatchJobWorkloadKey (TaskInstanceKey | CallbackKey). Although CallbackKey is aliased to str, mypy treats the nominal type alias as distinct. Use the proper CallbackKey import to satisfy mypy.
…stanceKey types The strict TypeError for unknown key types broke executor tests that pass Mock objects or raw tuples as keys (Lambda, Batch, ECS, Kubernetes). Restore the original fallback to CallbackState for any non-TaskInstanceKey, matching main's behavior.
…pat with 3.2.x The test_process_workloads_routes_execute_callback test uses CallbackKey(id=...) which requires the dataclass form introduced in 3.3. In Airflow 3.2.x, CallbackKey is a str type alias and does not accept keyword arguments. Change the skipif guard from AIRFLOW_V_3_2_PLUS to AIRFLOW_V_3_3_PLUS.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is PR 2 of the S3 Dag Bundle versioning series. PR 1 (#66491) added the
BundleVersiondataclass, Alembic migration, and persistence path. This PR completes the worker-side plumbing so that version data reaches the bundle instance at task execution time.Adds
version_datatoBundleInfoand threads it through the worker-side bundle initialization path so that structured version metadata (e.g., S3 manifests) reaches the bundle at task execution time.Changes:
BundleInfogainsversion_data: dict | None = NonefieldExecuteTask.make()readsversion_datafromDagVersion(via eagerly-loaded relationship)selectinload(TI.dag_version)to avoid N+1 queriesBaseDagBundle.__init__accepts and storesversion_dataDagBundlesManager.get_bundle()passesversion_datato the bundle constructortask_runner.parse()andcallback_supervisorpassversion_datathrough_generated.pyupdated with the new fieldrelated: #66491
Was generative AI tooling used to co-author this PR?
Generated-by: Claude Code (Opus 4) following the guidelines