Replies: 5 comments
-
Why not comparing the content of the code and skip merging in this case? I think you have to read the file content anyway to calculate the hash and at that time we already have read the DB entry with the code (it is all in-memory). I am not 100% sure if in this case merge() with unchanged "content" will cause additional write to the DB (it would be worth checking), but even if it does then simpler solution will be to compare the content and not perform the merge at all in this case. We could - of-course -store the hash of the content in the DB and do not read the entry from DB if the hash does not match, but I have a feeling that would increase the complexity, and not really improve the performance (we would have to read the content of all the files and calculate hashes anyway). I am not sure if this is worth optimising. WDYT @wolfier ? |
Beta Was this translation helpful? Give feedback.
-
Hey @wolfier, WDYT. I am tempted to close this one, unless you have some arguments why I should not. |
Beta Was this translation helpful? Give feedback.
-
Yeah, if we can compare the content of the code to determine whether or not to merge the orm_dag_code object, that would guarantees that the code view is displaying the latest version regardless of modified timestamp. Definitely more very straight forward than keeping a content hash. I was thinking of a content hash because the file location is hashed. |
Beta Was this translation helpful? Give feedback.
-
Would you want to try to optimize it then by comparing the content ? I think it would be worth while to check it if merge is already optimizing it away |
Beta Was this translation helpful? Give feedback.
-
We had a weird instance today of similar issue. We use docker image with embedded dags, and despite new image (November 10) being pushed, the timestamp on the file was November 03, but the row in the database was indicating last modification November 09, so the dag processor was not updating the entry in database, so webserver was still showing the old code. Touching the file, or removing the offending row in |
Beta Was this translation helpful? Give feedback.
-
Description
Add or update additional parameters that defines when a file's dag_code entry should be refreshed.
Use case / motivation
When
store_dag_code
is enabled, dag code is stored in thedag_code
table. The table is updated when the originating file's modification time is greater than when the dag code was last cached.I would like the code view (pulled from the
dag_code
table) to update even when the modification time is prior to when the dag_code entry was last updated.Updating the cache relying only on modification date of the source python file is not reliable because the timestamp does not carry information about the content of the file.
In addition or instead of checking the modification time, I would like to consider the file content hash as well.
Are you willing to submit a PR?
Yes!
Additional Information
Where dag code is updated.
https://github.com/apache/airflow/blob/2.1.0/airflow/models/dagcode.py#L113-L123
Beta Was this translation helpful? Give feedback.
All reactions