Add team_name tags to Kubernetes executor metrics#69046
Conversation
3e4f4ae to
58fc3ca
Compare
Add the team_name tag to Kubernetes executor metrics to improve per-team observability in multi-team deployments. Pass the executor's team_name to AirflowKubernetesScheduler so scheduler-emitted metrics include the tag consistently. Maintain compatibility with prior Airflow releases by initializing team_name only when it is not already defined. Update the existing metric tests to verify metrics are emitted both with and without the team_name tag.
58fc3ca to
a3aa3fd
Compare
|
I'd like to see the unit tests added for the missing paths. Improving the coverage while you are in there seems reasonable and they shouldn't be too difficult to follow the existing patterns. |
Okay. I was thinking of submitting a follow-up PR for them but I can them to this PR on your request. |
| self.log.info("Deleting pod %s in namespace %s", pod_name, namespace) | ||
| with Stats.timer("kubernetes_executor.pod_deletion"): | ||
| with Stats.timer( | ||
| "kubernetes_executor.pod_deletion", tags=prune_dict({"team_name": self.team_name}) |
There was a problem hiding this comment.
For these timers, can you run a manual test and confirm they work as intended? Technically this is a change; even if team_name is None, this will now be called with tags={} where previously it didn't get a tags at all. You may need to do something like tags=prune_dict(...) or None?
There was a problem hiding this comment.
Just chiming in here as I was doing some testing for amazon and I think we should be covered based on what I found here in the stats.py file:
Still worth running some manual tests but figured I'd share that.
Description
This change adds a
team_nametag to Kubernetes executor metrics to improve per-team observability in multi-team deployments.The following metrics now include the
team_nametag when the executor is configured with a team:kubernetes_executor.pod_creationkubernetes_executor.pod_deletionkubernetes_executor.pod_patchingkubernetes_executor.adopt_task_instances.durationkubernetes_executor.pod_creation_statuskubernetes_executor.pod_deletion_statuskubernetes_executor.pod_patching_statusThe metric
kubernetes_executor.clear_not_launched_queued_tasks.durationwas not updated because it is no longer present in the current Kubernetes executor implementation.With the exception of
kubernetes_executor.adopt_task_instances.duration, all of the above metrics are emitted fromAirflowKubernetesScheduler.Rationale
The Kubernetes executor supports multi-team scheduling, but its operational metrics could not previously be attributed to individual teams. Adding the
team_nametag enables per-team dashboards, alerting, and troubleshooting while remaining backwards compatible for deployments that do not use teams.Notes
team_nameis owned byKubernetesExecutor, while most of the affected metrics are emitted fromAirflowKubernetesScheduler. To allow scheduler-emitted metrics to include theteam_nametag, the value is passed toAirflowKubernetesSchedulerwhen it is constructed by the executor.For compatibility with previous Airflow releases,
team_nameis initialized only if it is not already defined, since the attribute was introduced as part of multi-team support.Tests
Updated the existing unit tests for
kubernetes_executor.pod_deletionandkubernetes_executor.pod_deletion_statusto verify metrics are emitted both with and without theteam_nametag. The test cases forteam_namebeing present in the metrics are skipped when airflow version is less than 3.1.0 as theteam_nameattribute was added in version 3.1.0.The following metrics do not currently have dedicated unit test coverage, so no test-related adjustments were made for them:
kubernetes_executor.pod_creationkubernetes_executor.pod_creation_statuskubernetes_executor.pod_patchingkubernetes_executor.pod_patching_statuskubernetes_executor.adopt_task_instances.durationBackwards Compatibility
This change is additive only. Existing metric names remain unchanged, and the
team_nametag is emitted only when the Kubernetes executor is configured with a team. Existing dashboards and integrations that do not use the new tag continue to function unchanged.Related: #68996
Was generative AI tooling used to co-author this PR?
Generated-by: [GPT 5.5] following the guidelines