Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce fail_policy #41463

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

raphaelauv
Copy link
Contributor

@raphaelauv raphaelauv commented Aug 14, 2024

following first try based on 2.10.0b1 :#41047

  • introduce fail_policy parameter

  • remove soft_fail and silent_fail parameters

  • introduce the SKIP_ON_ANY_ERROR

  • replace soft_fail by SKIP_ON_TIMEOUT

  • replace silent_fail by IGNORE_ERROR

why ?

cause currently soft_fail never_fail and silent_fail are confusing

@raphaelauv raphaelauv force-pushed the fix_sensors_soft_fail_2 branch 9 times, most recently from 9db3ba6 to 59010d3 Compare August 19, 2024 09:10
@raphaelauv raphaelauv force-pushed the fix_sensors_soft_fail_2 branch 2 times, most recently from c45677d to e8f002a Compare August 25, 2024 13:03
@potiuk
Copy link
Member

potiuk commented Aug 25, 2024

Hey @raphaelauv -> maybe that is a misunderstanding. We can't really "remove" soft fail and replace it in the providers "now' - they will still have to support 2.8 - 2.10, so we cannot *really remove soft_fail (and tests with it). This was not really possible in 2.10 and is not really going to be possible now.

What I really see as the moment we implement it is when we have the new "task-sdk" in Airflow 3 - this is where it will be providing "Base Sensor" , "Base Operator" - and there we will be able to do similar things as with "common.compat" - where we will be able to add new functionality ("Fail Policy") without having to be tied with specific Airflow 3 version.

I think the way implemented now it's not really usable - it won't work for Airfllow 2, and Airflow 3 will have the "sdk" implemented differently, so I am not sure it's worth spending time on it now. There is no "easy" way of implementing it without breaking current "2.8, 2.9, 2.10" behaviour.

@raphaelauv
Copy link
Contributor Author

raphaelauv commented Aug 25, 2024

hi, I did a compatible 2.8 2.9 2.10 implementation #41047 but you said that it was not a feature welcome on 2.10 branch,

so there go this PR that is waiting for airflow 3 and if there is a new "sdk" I will make it compatible.

@potiuk
Copy link
Member

potiuk commented Aug 25, 2024

hi, I did a compatible 2.8 2.9 2.10 implementation #41047 but you said that it was not a feature welcome on 2.10 branch.

I think there was no (easy) way to make it really 2.8-2.10 compatible. It was still changing behaviour for 2.10 - as I understood it and it was main reason why I hesitated with merging it - because we did not want to suddenly change behaviour of those sensors whe users will move between 2.8 and 2.10.

The problem with the change is it is implemented now is that you can't import "AirflowPokeFailException" And "FailPolicy" from "airflow 2" in providers - because it is just missing in Airflow 2.8 - 2.10 and providers simply will not work for 2.8. 2.9. 2.10. And since we do not have Airlfow 3 for quite some time (6 months) - it means they won't work at all. And having it in 2.10 would not help to much (because the providers will still have to work for 2.8-2.9 where it is missing).

But the future implementation of it might be (or at least that's how I see it). Example SFTP provider:

  • airflow.providers.sftp depends on airflow.sdlk (which is independent from Airflow Version - and can be installed also in Airflow 2.8 - 2.10 (or whatever versions will be supported then)

  • the PokeFailException , Fail Policy and - most importantly - BaseSensorOperator will also come from the "task-sdk", not from "airflow".

  • the "BaseSensorOperator" from "task-sdk" - at least as long as Airflow 2 is supported - should support both - soft fail (with deprecation) and Fail Policy - and deprecation on soft_fail will suggest to switch to Airflow 3's Fail Policy and we might be able to drop it some day - and the thing there is that for those future providers, the user will be able to see the deprecation to Fail Policy even when running on Airflow 2.

I am not sure if it makes it clearer - but with "task.sdk" and code for airflow BaseOperators / Sensor operators etc. coming from there, rather than from "airflow" we will be able to handle it quite a bit better. Or so I hope.

@raphaelauv
Copy link
Contributor Author

 It was still changing behaviour for 2.10 .. because we did not want to suddenly change behaviour of those sensors

yes previous PR was fixing a bug introduce in airflow 2.7.1 that suddenly changed the behaviour of sensors

The problem with the change is it is implemented now is that you can't import "AirflowPokeFailException" And "FailPolicy" from "airflow 2" in providers - because it is just missing in Airflow 2.8 - 2.10 and providers simply will not work for 2.8. 2.9. 2.10. And since we do not have Airlfow 3 for quite some time (6 months) - it means they won't work at all. And having it in 2.10 would not help to much (because the providers will still have to work for 2.8-2.9 where it is missing).

my previous PR was compatible with 2.9 and 2.10, all tests were green ->
image


...  but with "task.sdk" ...

thanks for the info I will have a look to the AIP
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-72+Task+Execution+Interface+aka+Task+SDK

@potiuk
Copy link
Member

potiuk commented Aug 25, 2024

yes previous PR was fixing a bug introduce in airflow 2.7.1 that suddenly changed the behaviour of sensors

I think this is really a question of qualification of that change as a bug - when it works the way it worked, for such long (2.7.1 - 2.9.3) you cannot "really" see it as a bug any more - someone could have started relying on it. And there is no good solution to that issue - whetever you do, you break something, so better to keep the old behaviour, deprecate it eventually and add new way of doing things better. Which I think Airflow 3 is better for rather than 2.10 becasue this "new" thing would only work for 2.10 which is the last releae of Airflow 2.

Unfortunately I have not seen anyone else commenting on it and deciding, there was a rush to release Airflow 2.10, so I did not want to make a decision myself.

Sorry for that. I was just cautious.

@raphaelauv
Copy link
Contributor Author

I have not seen anyone else commenting on it

yes it's true that this "problem/bug/feature" does not attract attention, I'm always surprise how alone I feel when I found a bug on delivery semantics that can lead to data loss, and also how anecdotal and transparent it is for the user to change/break/fix these behaviors.

Thanks again for your review @potiuk , always a pleasure 👍

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Oct 11, 2024
@raphaelauv
Copy link
Contributor Author

no stale

@github-actions github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Oct 15, 2024
@raphaelauv raphaelauv force-pushed the fix_sensors_soft_fail_2 branch 3 times, most recently from 5cc0e38 to 37b0f6b Compare November 18, 2024 11:56
@raphaelauv raphaelauv force-pushed the fix_sensors_soft_fail_2 branch 2 times, most recently from 8760359 to 25076d4 Compare December 9, 2024 11:27
@eladkal
Copy link
Contributor

eladkal commented Dec 16, 2024

You can apply the breaking change to Airflow 3 and in providers work it so it will be backward compatible till min version of providers is Airflow 3. It's more code lines but it's the safe path to introduce this feature.

@raphaelauv raphaelauv force-pushed the fix_sensors_soft_fail_2 branch from 25076d4 to 0d71a70 Compare December 28, 2024 16:15
@raphaelauv
Copy link
Contributor Author

okay, I will re-add compatibility with 2.X , thanks for the review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants