Adding a cache to Variable.get #30265
-
Problem I'm trying to solve:Users use
Because even for a dag that's only run once per day, the top-level code is going to be executed every time the dag is parsed, which can be quite frequent. Since the order in which Secret backends are probed is:
if users rely on the default value, or a value in the DB, they still call the custom backend every time (and then do a DB request). Proposed remediation:Adding a cache, even with a relatively short expiration date (aka TTL), could reduce significantly the number of calls made, and speedup a log DAG parsing time, but also DAG execution.
Alternatives
Technical details(see link to PR at the top of this discussion) A simple in-memory cache will not work, because a lot of operations with dags happen in dedicated processes, which are working with a copy of the memory, so any changes to an in-memory cache would be done to the copy and lost once the process has done its job. For this reason, I'm using a Values are saved transparently on access, and the cache is invalidated on changes that go through This works very well in breeze, when processes are spawned on the same machine. I haven't tested this using celery, but Here is an example ran in breeze, of parsing 100 relatively simple dag files that use 8 Variables calls to get a bunch of configuration parameters in the top level code. Main discussion points(beyond doing it / not doing it)
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Just to highlight the benefits:
|
Beta Was this translation helpful? Give feedback.
-
One comment here (just for information and explanation on why I am closing this discussion - not because I want to close the discussion in general - you will see my comment on devlist soon that I am actually supportive after sleeping on it). I think the discussion should be continued in https://lists.apache.org/thread/ppbb87tohos9zs1yv6pf8b2zyq66dmdk where you started it. While GitHub discussions might provide better interface and is good for casual discussion, this one might lead to some decisions impacting all airflow users and we want to keep the record of it in the medium which is fully owned and controlled by the ASF ("if it did not happen on the mailing lits - it did not happen"). When in few years we will refer to it, we want to find all the arguments and reasonig in the archives (we do not want to base projects/foundation's existence on the fact that Github Discussions are still around - generally speaking in our decision making process we should act as if Github Discussions will go away tomorrow - we should be able to continue business as usual and have all the reasoning on why we made certain decisions (this is why adding context in controversial/complex cases to PR commit messages is so important for exampel - because we should act as if Pull Request are disappearing tomorrow - we should be able to figure out why we are having some changes from the commit messages without the need to have full PR context).. If you see people commenting here and you would like to make decision and start voting on the devlist - or reach out for lazy consensus (which I think with that kind of change is needed) - you would have to yourself extract and summarize the gist of the discussion here and bring it to devlist (which has the risk of bias and missing important points). That's why I am closing it now and I think we should continue in https://lists.apache.org/thread/ppbb87tohos9zs1yv6pf8b2zyq66dmdk (even if the interface there is poorer). |
Beta Was this translation helpful? Give feedback.
One comment here (just for information and explanation on why I am closing this discussion - not because I want to close the discussion in general - you will see my comment on devlist soon that I am actually supportive after sleeping on it).
I think the discussion should be continued in https://lists.apache.org/thread/ppbb87tohos9zs1yv6pf8b2zyq66dmdk where you started it. While GitHub discussions might provide better interface and is good for casual discussion, this one might lead to some decisions impacting all airflow users and we want to keep the record of it in the medium which is fully owned and controlled by the ASF ("if it did not happen on the mailing lits - it did not happen").
W…