Does it really matter if you import numpy when it will be in sys.modules after the first parse? #33737
-
Talking about https://airflow.apache.org/docs/apache-airflow/2.7.0/best-practices.html#top-level-python-code, which says
I would like to be able to import modules at the top level so I can define my functions with type hints. Could I potentially just extend the timeout the first time I ran the dag import, and then every subsequent parse would just fetch numpy out of sys.modules? Or does the scheduler make a brand new environment every AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT seconds so it can catch dynamic changes to the dags? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
You can use
This is a standard way of adding imports that are expensive but needed for type hints: See PEP specification https://docs.python.org/3/library/typing.html#constant |
Beta Was this translation helpful? Give feedback.
Every DAG in Airflow is parsed in a separately forked process. This is in order to achieve a) isolation b) reloading classes every time import is made to make sure we load latest version of imported files c) in order to not crash whole DAG file processor in case - for whatever reason the import will fail (even with errors like SIGSEGV) - only the forked process will get killed and main process will continue forking processes to parse DAG files.
This means that whatever is imported in DAG parsed by DAG file processor is only cached in that forked process and is discarded once parsing of that individual file completes (the processes exit after the DAG is serialized to json form and saveed t…