Skip to content

Add team_name tags to Kubernetes executor metrics#69046

Open
SameerMesiah97 wants to merge 1 commit into
apache:mainfrom
SameerMesiah97:68996-CNCF-Kubernetes-Team-Name-Metrics
Open

Add team_name tags to Kubernetes executor metrics#69046
SameerMesiah97 wants to merge 1 commit into
apache:mainfrom
SameerMesiah97:68996-CNCF-Kubernetes-Team-Name-Metrics

Conversation

@SameerMesiah97

@SameerMesiah97 SameerMesiah97 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Description

This change adds a team_name tag to Kubernetes executor metrics to improve per-team observability in multi-team deployments.

The following metrics now include the team_name tag when the executor is configured with a team:

  • kubernetes_executor.pod_creation
  • kubernetes_executor.pod_deletion
  • kubernetes_executor.pod_patching
  • kubernetes_executor.adopt_task_instances.duration
  • kubernetes_executor.pod_creation_status
  • kubernetes_executor.pod_deletion_status
  • kubernetes_executor.pod_patching_status

The metric kubernetes_executor.clear_not_launched_queued_tasks.duration was not updated because it is no longer present in the current Kubernetes executor implementation.

With the exception of kubernetes_executor.adopt_task_instances.duration, all of the above metrics are emitted from AirflowKubernetesScheduler.

Rationale

The Kubernetes executor supports multi-team scheduling, but its operational metrics could not previously be attributed to individual teams. Adding the team_name tag enables per-team dashboards, alerting, and troubleshooting while remaining backwards compatible for deployments that do not use teams.

Notes

team_name is owned by KubernetesExecutor, while most of the affected metrics are emitted from AirflowKubernetesScheduler. To allow scheduler-emitted metrics to include the team_name tag, the value is passed to AirflowKubernetesScheduler when it is constructed by the executor.

For compatibility with previous Airflow releases, team_name is initialized only if it is not already defined, since the attribute was introduced as part of multi-team support.

Tests

Updated the existing unit tests for kubernetes_executor.pod_deletion and kubernetes_executor.pod_deletion_status to verify metrics are emitted both with and without the team_name tag. The test cases for team_name being present in the metrics are skipped when airflow version is less than 3.1.0 as the team_name attribute was added in version 3.1.0.

The following metrics do not currently have dedicated unit test coverage, so no test-related adjustments were made for them:

  • kubernetes_executor.pod_creation
  • kubernetes_executor.pod_creation_status
  • kubernetes_executor.pod_patching
  • kubernetes_executor.pod_patching_status
  • kubernetes_executor.adopt_task_instances.duration

Backwards Compatibility

This change is additive only. Existing metric names remain unchanged, and the team_name tag is emitted only when the Kubernetes executor is configured with a team. Existing dashboards and integrations that do not use the new tag continue to function unchanged.

Related: #68996

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: [GPT 5.5] following the guidelines

@boring-cyborg boring-cyborg Bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels Jun 26, 2026
@SameerMesiah97 SameerMesiah97 force-pushed the 68996-CNCF-Kubernetes-Team-Name-Metrics branch 3 times, most recently from 3e4f4ae to 58fc3ca Compare June 26, 2026 20:03
Add the team_name tag to Kubernetes executor metrics to improve
per-team observability in multi-team deployments.

Pass the executor's team_name to AirflowKubernetesScheduler so
scheduler-emitted metrics include the tag consistently.

Maintain compatibility with prior Airflow releases by initializing
team_name only when it is not already defined.

Update the existing metric tests to verify metrics are emitted
both with and without the team_name tag.
@ferruzzi

Copy link
Copy Markdown
Contributor

I'd like to see the unit tests added for the missing paths. Improving the coverage while you are in there seems reasonable and they shouldn't be too difficult to follow the existing patterns.

@SameerMesiah97

Copy link
Copy Markdown
Contributor Author

I'd like to see the unit tests added for the missing paths. Improving the coverage while you are in there seems reasonable and they shouldn't be too difficult to follow the existing patterns.

Okay. I was thinking of submitting a follow-up PR for them but I can them to this PR on your request.

self.log.info("Deleting pod %s in namespace %s", pod_name, namespace)
with Stats.timer("kubernetes_executor.pod_deletion"):
with Stats.timer(
"kubernetes_executor.pod_deletion", tags=prune_dict({"team_name": self.team_name})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these timers, can you run a manual test and confirm they work as intended? Technically this is a change; even if team_name is None, this will now be called with tags={} where previously it didn't get a tags at all. You may need to do something like tags=prune_dict(...) or None?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just chiming in here as I was doing some testing for amazon and I think we should be covered based on what I found here in the stats.py file:

regular_kw: dict[str, Any] = {**kwargs}
if tags:
regular_kw["tags"] = tags

Still worth running some manual tests but figured I'd share that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants