Fix macOS SIGSEGV in triggerer and Dag processor via fork+exec#69008
Open
kaxil wants to merge 1 commit into
Open
Fix macOS SIGSEGV in triggerer and Dag processor via fork+exec#69008kaxil wants to merge 1 commit into
kaxil wants to merge 1 commit into
Conversation
The triggerer and DAG processor fork worker children with a bare os.fork(), which is not safe on macOS: the child inherits half-initialized Objective-C runtime state and crashes with SIGSEGV/SIGABRT once it touches a system framework (DNS resolution, secret backends, HTTP clients). apache#64874 fixed this for task execution; both of these components hit the same crash because they run user trigger and DAG-parsing code that makes network calls. Generalize the fork+exec path so any importable entry point is rehydrated in the exec'd child from its module:qualname, and inherit the structured-log channel on a fixed fd so sync and async children are handled uniformly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The triggerer and Dag processor fork their worker children with a bare
os.fork(). On macOS this is unsafe: the child inherits half-initialized Objective-C runtime state and crashes withSIGSEGV/SIGABRTthe moment it touches an Apple framework (DNS resolution, secret backends, HTTP clients). Both components run user code that makes network calls, triggers poll APIs/queues and Dag files frequently do top-level connection/variable lookups, so they hit this routinely during local macOS development.#64874 fixed the same crash for task execution by switching that path from bare
forktofork+exec. This extends the fix to the triggerer and Dag processor, closing #65691.What changed
_child_exec_main()so it rehydrates any importable entry point in the exec'd child, named via_AIRFLOW_CHILD_TARGET=module:qualnameand resolved with stdlibpkgutil.resolve_name. Previously it hardcoded the task-execution entry point.TriggerRunnerSupervisor.startandDagFileProcessorProcess.startopt intofork+execon fork-unsafe platforms (_should_use_exec()), passing their own entry points (run_in_process/_parse_file_entrypoint).ResendLoggingFD. This let the task-execution path drop its_AIRFLOW_FORK_EXECenv var and the conditionalreinit_supervisor_comms()call.Design notes
ResendLoggingFD: the triggerer's child comms are async, and the existing FD-passing handshake needs a synchronousrecv_fdsbefore asyncio takes over the socket. Inheriting the log socket on a fixed fd avoids that and handles the synchronous (task, Dag processor) and asynchronous (triggerer) children uniformly, with no per-entry-point log wiring. The supervisor side is unchanged; it already creates and registers the log socketpair for every child._should_use_exec()issys.platform in {"darwin"}. On Linux/CI it returns False, so behavior there is unchanged (bare fork).start()rejects ause_execrequest for a non-importable target (closure/lambda) up front, so a future caller gets a clear error instead of an opaque failure in the exec'd child.closes: #65691
related: #64874