Option to include subdir in trigger_dag to not make scheduler scan the whole dag folder #24888
Replies: 3 comments 1 reply
-
Thanks for opening your first issue here! Be sure to follow the issue template! |
Beta Was this translation helpful? Give feedback.
-
As I can see in the code, this operator checks the path to DAG File in the database and only loads one file. I don't know what you would like to optimize here |
Beta Was this translation helpful? Give feedback.
-
So we have a use case where multiple dynamic dags are getting added to the dagbag and I believe there will always be a latency between dropping a new dag into the dagbag folder and operator checking if the path/record of that new dag exists in the table or not using trigger_dag. So, to let scheduler take as much time it needs to insert the record into the table, we trigger the new dag and Proposed solution: Potentially, either add a table in airflow backend data model or use an index or bulk insert or similar so that the performance of scheduler, while inserting the new record, does not gets hampered and searching of the new dag gets faster. Just a thought: May be if we can have the provision to change the type of DB of Airflow so that instead of postgres, we can change it to a NoSQL with index matching that in postgres right now (I am hoping) so that inserting and searching gets faster and one can manage a not-so-fast update. |
Beta Was this translation helpful? Give feedback.
-
Description
We have a design where 10s of child dags gets created every few minutes and gets triggered from a parent dag. While triggering each of the child dag, I believe scheduler searches the whole dag bag which is making the whole process slower. We have put a while loop to run TriggerDagRunOperator/trigger_dagy, if it is successful in triggering then it exits the loop otherwise trigger it again.
I believe that to decrease the load from the scheduler, there should be a provision to supply subdir to TriggerDagRunOperator so that scheduler only searched for the dag_id inside of that folder instead of whole dagbag.
Use case/motivation
We have a parent dag trigger multiple child dags and we want to decrease the time to let scheduler discover the child dag faster.
Related issues
Discussion: #19547
Are you willing to submit a PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions