Skip to content

Add Workload Identity authentication to OpenAI provider#69069

Merged
kaxil merged 2 commits into
apache:mainfrom
astronomer:openai-provider-workload-identity
Jun 27, 2026
Merged

Add Workload Identity authentication to OpenAI provider#69069
kaxil merged 2 commits into
apache:mainfrom
astronomer:openai-provider-workload-identity

Conversation

@kaxil

@kaxil kaxil commented Jun 26, 2026

Copy link
Copy Markdown
Member

Add Workload Identity authentication to the OpenAI provider, so connections can authenticate with short-lived identity tokens instead of a long-lived API key. The mechanism is selected with a new auth_type key in the connection extra; it defaults to api_key, so existing connections are unchanged.

Stacked on #69068 (requires openai>=2.37.0). Review the latest commit; the first commit is #69068 and will drop out once it merges.

Why first-class wiring instead of the existing passthrough

The connection already forwards openai_client_kwargs to the OpenAI client, but Workload Identity needs a token-provider callable, which can't be expressed in a JSON connection extra. So get_conn builds the provider from declarative config keys instead.

Usage

Set auth_type to workload_identity and choose a token source with workload_identity_provider:

  • kubernetes -- service account token from token_file_path (defaults to the in-cluster path)
  • azure -- Azure managed identity (optional resource, client_id, object_id, msi_res_id, api_version)
  • gcp -- GCP ID token for audience
  • custom -- import token_provider (a dotted path to a Callable[[], str]); token_type is jwt (default) or id

identity_provider_id and service_account_id are required; refresh_buffer_seconds is optional.

Example (Kubernetes pod):

{
  "auth_type": "workload_identity",
  "workload_identity_provider": "kubernetes",
  "identity_provider_id": "idp-123",
  "service_account_id": "sa-456"
}

Notes

  • The API-key path is unchanged. api_key is popped from openai_client_kwargs for every path so it is never forwarded alongside workload_identity (the client rejects both being set).
  • The custom source imports and calls the named callable in the process running the hook, so point it only at trusted code. Connection extra is an operator/admin surface, consistent with how other providers resolve callables from config.
  • The selector is named workload_identity_provider to mirror OpenAI's own "workload identity provider" terminology.

Related:


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

kaxil added 2 commits June 26, 2026 23:55
Raise the openai[datalib] floor to >=2.37.0 so the provider builds
against the current 2.x SDK line, and refresh stale default models
(gpt-3.5-turbo -> gpt-4o-mini, text-embedding-ada-002 ->
text-embedding-3-small).
Authenticate OpenAI connections with short-lived identity tokens instead
of a long-lived API key, selected with a new auth_type key in the
connection extra (defaults to api_key, so existing connections are
unchanged). Supports Kubernetes, Azure managed identity, GCP, and a
custom token provider.
@kaxil kaxil merged commit 438107f into apache:main Jun 27, 2026
92 checks passed
@kaxil kaxil deleted the openai-provider-workload-identity branch June 27, 2026 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants