Serialize dag if it is not serialized even if it does not have task instances#68947
Serialize dag if it is not serialized even if it does not have task instances#68947kgskgs wants to merge 1 commit into
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
|
SameerMesiah97
left a comment
There was a problem hiding this comment.
This needs more explanation. Please see my comment on the test.
| session.execute(delete(SDM).where(SDM.dag_id == dag.dag_id)) | ||
| session.flush() | ||
| assert session.scalar(select(func.count()).select_from(SDM)) == 0 | ||
| assert session.scalar(select(func.count()).select_from(DagVersion)) == 1 |
There was a problem hiding this comment.
I am just not sure how this state you are simulating can arise under normal operation. The test simulates the issue by manually deleting the SerializedDagModel row while leaving the DagVersion row intact. Since DAGs are serialized during parsing, I would expect DagVersion and SerializedDagModel to be created together under normal operation.
Could you expand the issue/PR description to explain how this state can occur in practice? Is this intended to recover from metadata corruption, failed migrations, manual database modification, or some known failure during serialization?
Furthermore, have you ran into this specific failure mode yourself?
There was a problem hiding this comment.
Yes I've ran into this myself, but I am not sure what caused it, I am also running a custom application built on top of airflow so most likely it was not caused by airflow itself (I suspect it happened when upgrading from airflow 2 to 3 but there is lots of custom code, sometimes too tightly coupled to airflow and it would be a lot of effort to determine exactly how it happened)
I'd say that there being nothing preventing you from getting into this state + the user being able to manually break their dags by editing the db is reason enough to have this
There was a problem hiding this comment.
Yes I've ran into this myself, but I am not sure what caused it, I am also running a custom application built on top of airflow so most likely it was not caused by airflow itself (I suspect it happened when upgrading from airflow 2 to 3 but there is lots of custom code, sometimes too tightly coupled to airflow and it would be a lot of effort to determine exactly how it happened) I'd say that there being nothing preventing you from getting into this state + the user being able to manually break their dags by editing the db is reason enough to have this
I would add all the detail you can in both the issue and the PR description (please include the what, why and how as well as any implications for backwards compatibility).
Prevent dags from not ever being serialized if they have no entry in
serialized_dagtable, but have one indag_versioncloses: #68944
Context:
I ran into this state while running a custom application on top of airflow. It is highly likely that it was caused by custom code and not airflow, but airflow doesn't seem to have any safeguards against entering the state, and it affected all instances of my application. I can't easily tell exactly how it happened exactly (we are in the middle of updating from airflow 2 to 3 and there are lots of changes)
I believe this fix is appropriate given that there should be multiple ways to reach this state e.g. by editing the db manually.
An alternative would be to try and add safeguards against getting into this state in the first place, but that would be way more complex than handling in this way + currently this is done at no additional cost as we already know whether the serialized dag exists or not by the old hash.
There should be no implications on backwards compatibility.
Was generative AI tooling used to co-author this PR?
Generated-by: Claude Sonnet 4.6 following the guidelines
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.