Skip to content

Fix macOS SIGSEGV in triggerer and Dag processor via fork+exec#69008

Open
kaxil wants to merge 1 commit into
apache:mainfrom
astronomer:macos-fork-exec-triggerer-dag-processor
Open

Fix macOS SIGSEGV in triggerer and Dag processor via fork+exec#69008
kaxil wants to merge 1 commit into
apache:mainfrom
astronomer:macos-fork-exec-triggerer-dag-processor

Conversation

@kaxil

@kaxil kaxil commented Jun 26, 2026

Copy link
Copy Markdown
Member

The triggerer and Dag processor fork their worker children with a bare os.fork(). On macOS this is unsafe: the child inherits half-initialized Objective-C runtime state and crashes with SIGSEGV/SIGABRT the moment it touches an Apple framework (DNS resolution, secret backends, HTTP clients). Both components run user code that makes network calls, triggers poll APIs/queues and Dag files frequently do top-level connection/variable lookups, so they hit this routinely during local macOS development.

#64874 fixed the same crash for task execution by switching that path from bare fork to fork+exec. This extends the fix to the triggerer and Dag processor, closing #65691.

What changed

  • Generalized _child_exec_main() so it rehydrates any importable entry point in the exec'd child, named via _AIRFLOW_CHILD_TARGET=module:qualname and resolved with stdlib pkgutil.resolve_name. Previously it hardcoded the task-execution entry point.
  • TriggerRunnerSupervisor.start and DagFileProcessorProcess.start opt into fork+exec on fork-unsafe platforms (_should_use_exec()), passing their own entry points (run_in_process / _parse_file_entrypoint).
  • The post-exec structured-log channel is now inherited on a fixed fd (3) alongside stdin/stdout/stderr on 0/1/2, instead of being re-requested after startup via ResendLoggingFD. This let the task-execution path drop its _AIRFLOW_FORK_EXEC env var and the conditional reinit_supervisor_comms() call.

Design notes

  • Why fd inheritance instead of ResendLoggingFD: the triggerer's child comms are async, and the existing FD-passing handshake needs a synchronous recv_fds before asyncio takes over the socket. Inheriting the log socket on a fixed fd avoids that and handles the synchronous (task, Dag processor) and asynchronous (triggerer) children uniformly, with no per-entry-point log wiring. The supervisor side is unchanged; it already creates and registers the log socketpair for every child.
  • Gating is macOS-only: _should_use_exec() is sys.platform in {"darwin"}. On Linux/CI it returns False, so behavior there is unchanged (bare fork).
  • Safety guard: start() rejects a use_exec request for a non-importable target (closure/lambda) up front, so a future caller gets a clear error instead of an opaque failure in the exec'd child.
  • The Dag processor skips its parse-time module pre-import on the exec path, since a fresh interpreter re-imports anyway and the pre-import would otherwise leak user modules into the long-lived processor manager.
  • The triggerer and Dag-processor subprocess tests fork their real entry points with trigger/Dag classes defined in the test modules, which a fresh exec'd interpreter cannot import. An autouse fixture pins those modules to bare fork so they behave identically on macOS and Linux/CI.

closes: #65691
related: #64874

The triggerer and DAG processor fork worker children with a bare os.fork(),
which is not safe on macOS: the child inherits half-initialized Objective-C
runtime state and crashes with SIGSEGV/SIGABRT once it touches a system
framework (DNS resolution, secret backends, HTTP clients). apache#64874 fixed this
for task execution; both of these components hit the same crash because they
run user trigger and DAG-parsing code that makes network calls.

Generalize the fork+exec path so any importable entry point is rehydrated in
the exec'd child from its module:qualname, and inherit the structured-log
channel on a fixed fd so sync and async children are handled uniformly.

@uranusjr uranusjr left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense

@kaxil kaxil marked this pull request as ready for review June 26, 2026 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extend macOS fork+exec fix to DAG processor and triggerer

2 participants