Skip to content

ProviderInfo: record provider module path to remove the package-name → module heuristic in example DAG bundle discovery #66305

@andreahlert

Description

@andreahlert

Description

_add_provider_example_dags_to_bundle in airflow-core/src/airflow/dag_processing/bundles/manager.py currently derives the Python package path of an installed provider from its distribution name with a string heuristic:

if package_name.startswith("apache-airflow-providers-"):
    suffix = package_name[len("apache-airflow-providers-") :]
    module_name = "airflow.providers." + suffix.replace("-", ".")
else:
    module_name = package_name.replace("-", "_")

This works for canonical Apache providers because they follow a fixed layout convention, but it has known weak points:

  • third-party providers whose distribution name does not start with apache-airflow-providers- fall back to a naive replace("-", "_") which is often wrong;
  • the dash-to-dot reverse is lossy: foo-bar and foo_bar both come back as foo.bar.

Why entry_point.module does not directly solve this

Provider entry points are declared as e.g. airflow.providers.standard.get_provider_info:get_provider_info. entry_point.module resolves to the get_provider_info module file, not the package, so importlib.import_module(entry_point.module) returns a module without __path__. Walking __path__ to find example_dags/ would iterate zero times.

Proposed approach

Extend ProviderInfo in shared/providers_discovery/src/airflow_shared/providers_discovery/providers_discovery.py to record the parent package path of each discovered provider:

  • derive it from entry_point.module by stripping the trailing .get_provider_info segment, or
  • surface it from the dist metadata (dist.files, top_level.txt, etc.).

Mirror the field on airflow.providers_manager.ProviderInfo so consumers in airflow-core can read it.

Acceptance criteria

  • ProviderInfo exposes a stable attribute (e.g. module_path or package_module) populated for every discovered provider.
  • The heuristic block in _add_provider_example_dags_to_bundle is replaced with a direct importlib.import_module(info.module_path) lookup.
  • The tracking comment # tracked at <issue-url> in manager.py is removed.
  • Tests cover both canonical Apache providers (apache-airflow-providers-standard, apache-airflow-providers-common-sql) and a synthetic third-party provider with a non-canonical distribution name.

Context

Surfaced during review of #66161 (#66161) by @potiuk and @jscheffl. Deferred to keep that PR focused on the example-DAG-bundle migration; this issue captures the proper fix.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions