Replies: 2 comments 2 replies
-
Please take a look at https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-56+Extensible+user+management and the discussion linked in the document on our devlist OIDC had been mentioned as preferred wait to implement AIP-56. Adding comments and taking part in the mailing list discussion and AIP-56 implementation is the way you should approach it. If you have experience with it joining the AIP-56 and volunteering time and effort to implemtnit under @vincbeck leadership is the way to go |
Beta Was this translation helpful? Give feedback.
-
Appreciate the quick reply.
Can you clarify this? I'm assuming the preferred WAY but just to be clear :) Maybe I'm misunderstanding something about the AIP but it seems to be about authn / authz TO Airflow, whereas I've talked about authn / authz FROM Airflow task. I'll still think about the AIP but let me clarify what I meant with a lot more details. Authn/z from a task is usually done by obtaining a token (in a hook implementation) using a long-lived credentials stored in some connection (although the long-lived token might be in connection as well). Let's say I want to hit internal service from an Airflow task. If I control the environment where the task is running, I have a way to inject credentials for internal service, e.g. this is how GCP enables hitting GCP services from a workload running on GCP, this is also integrated in GCP's Airflow hook implementation. But I doubt that controling the environment is a common case. So I have to put long-lived credentials in a connection. This is similar to how it used to be with Github Actions. You put credentials in Secrets and Github makes sure to inject you those when your action is running. What if my internal service can verify that a token it gets comes from a process scheduled by a specific Airflow instance and that process is running a specific task/dag? This is what Github called the OIDC token. It has nothing to do with user management, roles, SSO, etc. It's enabling the use of something like Workload Identity. Here is a PR on this repo that uses that feature - it removes the need to store AWS credentials cause AWS can verify that the request came from this Github repository. |
Beta Was this translation helpful? Give feedback.
-
Providing OIDC tokens to tasks, similar to GitHub's OIDC token, would enable more secure access to any OIDC provider - any service supporting an OIDC could be accessed without a need to store long-lived credentials in connection. I wonder if this came up in the past.
This would work very similarly to OIDC tokens in GitHub Actions. Their docs explain this a lot better than I could do here.
Contributing
I'll probably need this in the near future so would be willing to write an AIP / open a PR.
Implementation details
I've been playing with a prototype of this.
Airflow would need to host an OIDC issuer - this is just serving static files.
The task instance would be issued a token.
I think the implementation of passing this token to the running task would have to be dependent on the scheduler implementation, e.g. for K8s I think the following could work (but I've been playing using SequentialExecutor so would need to double-check this).
Exchanging the Airflow OIDC token for a real token for a specific service would be plugin-specific logic. The plugins would just get access to the OIDC token saying something like "This is dag X and task Y" and it is possible to verify that the token came from a specific Airflow instance.
Related issues/discussions
I wasn't able to find any relevant mentions. There are mentions in the context of SSO and specific OIDC providers - #9873 talks about AWS -> GCP but using their providers.
Beta Was this translation helpful? Give feedback.
All reactions