Skip to content

[Depends on #65958] Refactor dag discovery to respect coordinator#66392

Draft
jason810496 wants to merge 49 commits intoapache:mainfrom
astronomer:task-sdk/refactor-dag-discovery-to-respect-coordinator
Draft

[Depends on #65958] Refactor dag discovery to respect coordinator#66392
jason810496 wants to merge 49 commits intoapache:mainfrom
astronomer:task-sdk/refactor-dag-discovery-to-respect-coordinator

Conversation

@jason810496
Copy link
Copy Markdown
Member

Why

DAG discovery hard-coded the .py / ZIP heuristic and only delegated to coordinators via a flat list of file extensions. That prevented per-file decisions and let coordinator-claimed ZIPs be double-enumerated by the generic ZIP path.

What

  • Replace airflow.utils.file.list_py_file_paths with airflow.dag_processing.manager.discover_dag_file_paths.
  • Dispatch each file as: .py → coordinator (can_handle_dag_file(bundle_name, path)) → generic ZIP. Coordinator wins over ZIP, so a claimed .jar/archive isn't also scanned as a dag-zip.
  • Forward bundle_name so coordinators can scope per bundle.
  • Drop _runtime_file_extensions cache; _get_observed_filelocs now asks coordinators directly.

Was generative AI tooling used to co-author this PR?

jason810496 and others added 30 commits May 4, 2026 15:43
- Introduced the `apache-airflow-providers-languages-java` package with version 0.1.0.
- Added Java-specific task coordinators and DAG file processors.
- Created documentation including README, changelog, and installation instructions.
- Implemented provider info retrieval and commit tracking.
- Established testing framework with initial unit tests for Java provider components.
- Renamed all instances of "process coordinators" to "runtime coordinators" in the codebase.
- Updated the ProvidersManager and ProvidersManagerTaskRuntime classes to handle runtime coordinators.
- Modified the DagFileProcessorManager to collect file extensions from runtime coordinators.
- Adjusted the Java provider to implement the new runtime coordinator structure.
- Updated tests to reflect changes from process to runtime coordinators.
Tweak coordinator class names, attribute names, and method names to be
shorter and avoid the term 'runtime'.
- Remove Java SDK setup in Dockerfile
- add multi-language extras documentation
- Update TaskInstanceDTO description, and adjust API version in generated files
@jason810496 jason810496 marked this pull request as draft May 5, 2026 08:45
@jason810496 jason810496 removed area:providers area:Executors-core LocalExecutor & SequentialExecutor provider:edge Edge Executor / Worker (AIP-69) / edge3 area:ConfigTemplates area:registry backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch labels May 5, 2026
@jason810496 jason810496 added AIP-108: Coordinator Change this to an 'area:' label after AIP acceptance. and removed kind:documentation provider:standard area:task-sdk area:dev-tools labels May 5, 2026
@jason810496 jason810496 changed the title Refactor dag discovery to respect coordinator [Depends on #65958] Refactor dag discovery to respect coordinator May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AIP-108: Coordinator Change this to an 'area:' label after AIP acceptance. area:DAG-processing

Development

Successfully merging this pull request may close these issues.

2 participants