Skip to content

Allow using fresh interpreter besides fork() in Edge Worker#65943

Merged
jscheffl merged 17 commits into
apache:mainfrom
diogosilva30:fix/edge3-fork-deadlock-subprocess
May 14, 2026
Merged

Allow using fresh interpreter besides fork() in Edge Worker#65943
jscheffl merged 17 commits into
apache:mainfrom
diogosilva30:fix/edge3-fork-deadlock-subprocess

Conversation

@diogosilva30
Copy link
Copy Markdown
Contributor

@diogosilva30 diogosilva30 commented Apr 27, 2026

What

Update the Edge worker task launch path to honor Airflow's existing [core] execute_tasks_new_python_interpreter option.

By default, Edge workers keep the existing fork-based behavior. When execute_tasks_new_python_interpreter=True is configured, or when os.fork is unavailable, the worker launches the task in a fresh Python interpreter with subprocess.Popen and the existing airflow.sdk.execution_time.execute_workload entrypoint.

Fixes #65942

Why

Some Edge worker deployments can run task execution from a multi-threaded worker process. Forking a process after threads have started can inherit unsafe parent state, including import locks and partially initialized modules. In affected deployments this can show up as intermittent task-start failures, plugin import errors, or startup-reschedule exhaustion.

Airflow already has a core setting for this tradeoff: execute_tasks_new_python_interpreter. Other execution paths can use it to choose a fresh interpreter instead of fork. This PR applies the same behavior to the Edge worker without changing the default for existing deployments.

How

The change keeps both launch modes:

Mode When used Behavior
Fork Default when os.fork exists and execute_tasks_new_python_interpreter=False Uses multiprocessing.Process and the existing supervisor helper
Fresh interpreter execute_tasks_new_python_interpreter=True or no os.fork Uses subprocess.Popen with python -m airflow.sdk.execution_time.execute_workload --json-string ...

The fresh-interpreter path also spools stderr to a temporary file instead of stderr=PIPE. The worker only needs stderr after the subprocess exits, and a pipe can deadlock if the child writes enough data before the parent reads it. Spooling to a file avoids that while still preserving root failure details in task logs.

Changes

File What changed
providers/edge3/src/airflow/providers/edge3/cli/worker.py Route task launch through fork or fresh interpreter based on self.conf.getboolean("core", "execute_tasks_new_python_interpreter"); track subprocess stderr temp files by PID; upload subprocess stderr details on failure; preserve fork result-queue handling
providers/edge3/tests/unit/edge3/cli/test_worker.py Add coverage for launch-mode routing, subprocess command construction, stderr spooling, and failed subprocess log upload

Notes

  • Fork remains the default behavior.
  • The fork path still drains the multiprocessing result queue before waiting for process exit, preserving the previous deadlock protection for large exception payloads.
  • The subprocess path cannot return a Python exception object to the parent process, so the uploaded failure detail is based on exit code plus stderr content.
  • The change uses self.conf, not the global config object, so team-aware Edge worker configuration is respected.

Testing

  • uv run ruff format providers/edge3/src/airflow/providers/edge3/cli/worker.py providers/edge3/tests/unit/edge3/cli/test_worker.py
  • uv run ruff check --fix providers/edge3/src/airflow/providers/edge3/cli/worker.py providers/edge3/tests/unit/edge3/cli/test_worker.py
  • uv run --project providers/edge3 pytest providers/edge3/tests/unit/edge3/cli/test_worker.py -xvs (68 passed)
  • uv run prek run --files providers/edge3/src/airflow/providers/edge3/cli/worker.py providers/edge3/tests/unit/edge3/cli/test_worker.py
  • uv run --project providers/edge3 mypy providers/edge3/src/airflow/providers/edge3/cli/worker.py providers/edge3/tests/unit/edge3/cli/test_worker.py

Was generative AI tooling used to co-author this PR?
  • Yes — GitHub Copilot and Claude Opus 4.6

Generated-by: GitHub Copilot following the guidelines

@boring-cyborg boring-cyborg Bot added area:providers provider:edge Edge Executor / Worker (AIP-69) / edge3 labels Apr 27, 2026
@boring-cyborg
Copy link
Copy Markdown

boring-cyborg Bot commented Apr 27, 2026

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@jscheffl
Copy link
Copy Markdown
Contributor

That bug report sounds interesting and we run Edge Worker since more than a year in production - with more than one job in concurrency. Never had any of the reported problems so wondering why it hits in your environment.

os.fork() is also being used in CeleryExecutor and LocalExecutor which are the other main work horses in Airflow since years.

I need to admit I am not a Unix/Signalling/fork() expert but a bit courious how this problem appears in your env. The implementation in EdgeWorker was also "just" inherited by Celery and LocalExecutor.

@jscheffl
Copy link
Copy Markdown
Contributor

I thought a moment (but not final) about the PR. What wonders me a bit that you say there are 22 threads being started - Edge workers uses AsyncIO with tasks living in one thread in an event loop. There might be a background thread being started by plugins in the environment but wondering how you get to 22. Can you have more information on this? I would have expected 1 thread.

Nevertheless the process spawn penalty has been seen much larger in my environments. So I#d not really favor in fully switching. Would like much rather a configuration option to define how to run a separate process.

@diogosilva30
Copy link
Copy Markdown
Contributor Author

@jscheffl it just happened again in our prod environment. After Sunday some tasks started randomly failing on edge worker.

The logs (anonymized):

[2026-05-04 11:08:08] INFO - Stats instance was created in PID 1 but accessed in PID 2408809. Re-initializing. source=airflow.stats loc=stats.py:57
[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/common/operators/monitoring/internal_active_sessions.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'common'
...
[2026-05-04 11:08:08] ERROR - Startup reschedule limit exceeded reschedule_count=3 max_reschedules=3 source=task loc=task_runner.py:631
[2026-05-04 11:08:09] WARNING - Process exited abnormally exit_code=1 source=task

Both common and monitoring are modules in AIRFLOW__CORE__PLUGINS_FOLDER (/opt/airflow/dags/repo/plugins). They're present on disk — git-sync keeps them up to date. The ModuleNotFoundError isn't a missing file problem, it's the forked child process seeing a corrupted import state inherited from the parent.

Full anonymized logs
[2026-05-04 11:08:08] INFO - Stats instance was created in PID 1 but accessed in PID 2408809. Re-initializing. source=airflow.stats loc=stats.py:57
[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/common/operators/monitoring/internal_active_sessions.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'common'
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/plugins_manager.py", line 291 in load_plugins_from_plugin_directory
File "<frozen importlib._bootstrap_external>", line 999 in exec_module
File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
File "/opt/airflow/dags/repo/plugins/common/operators/monitoring/internal_active_sessions.py", line 13 in <module>

[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/common/operators/monitoring/internal_agent_states.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'common'
File "/opt/airflow/dags/repo/plugins/common/operators/monitoring/internal_agent_states.py", line 11 in <module>

[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/common/operators/monitoring/internal_call_data.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'common'
File "/opt/airflow/dags/repo/plugins/common/operators/monitoring/internal_call_data.py", line 9 in <module>

[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/common/operators/monitoring/app_logged_in_users.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'monitoring'
File "/opt/airflow/dags/repo/plugins/common/operators/monitoring/app_logged_in_users.py", line 13 in <module>

[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/common/operators/monitoring/app_user_stats.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'common'
File "/opt/airflow/dags/repo/plugins/common/operators/monitoring/app_user_stats.py", line 14 in <module>

[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/common/operators/monitoring/tenant_info.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'monitoring'
File "/opt/airflow/dags/repo/plugins/common/operators/monitoring/tenant_info.py", line 8 in <module>

[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/common/operators/monitoring/internal_api_stats.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'common'
File "/opt/airflow/dags/repo/plugins/common/operators/monitoring/internal_api_stats.py", line 8 in <module>

[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/common/operators/monitoring/vendor_data.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'common'
File "/opt/airflow/dags/repo/plugins/common/operators/monitoring/vendor_data.py", line 8 in <module>

[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/monitoring/cache.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'monitoring'
File "/opt/airflow/dags/repo/plugins/monitoring/cache.py", line 18 in <module>

[2026-05-04 11:08:08] ERROR - Failed to import: /opt/airflow/dags/repo/dags/stg/monitoring/region/test_dag.py source=airflow.models.dagbag.BundleDagBag loc=dagbag.py:415
ModuleNotFoundError: No module named 'common'

[2026-05-04 11:08:08] ERROR - Dag not found during start up dag_id=stg__monitoring__region__test_staging_dag bundle=BundleInfo(name='dags-folder', version=None)
[2026-05-04 11:08:08] ERROR - Startup reschedule limit exceeded reschedule_count=3 max_reschedules=3
[2026-05-04 11:08:09] WARNING - Process exited abnormally exit_code=1
Pod spec (anonymized)
spec:
  containers:
    - name: edge-worker
      image: internal-registry.example.net/monitoring/airflow:3.1.8
      args: [edge, worker, '-q', stg__general__region]
      env:
        - name: AIRFLOW__CORE__PLUGINS_FOLDER
          value: /opt/airflow/dags/repo/plugins
        - name: AIRFLOW__EDGE__API_URL
          value: https://airflow.stg.monitoring.example.com/edge_worker/v1/rpcapi
      volumeMounts:
        - mountPath: /opt/airflow/dags
          name: dags
    - name: git-sync
      image: internal-registry.example.net/monitoring/git-sync:4.3.0
      args: [--repo=..., --root=/dags, --link=repo, --period=60s]
      volumeMounts:
        - mountPath: /dags
          name: dags
  volumes:
    - name: dags
      emptyDir: {}

This connects directly to your question about the thread count. The edge worker runs asyncio.run(edge_worker.start()) and by the time _launch_job() fires it's already running 22+ OS threads:

  1. The asyncio event loop main thread
  2. asyncio's default ThreadPoolExecutor — created lazily the first time anyio/aiofiles calls loop.run_in_executor(None, ...) in _push_logs_in_chunks. Python's default pool size is min(32, os.cpu_count() + 4) — on our 16-core nodes that's 20 threads, all kept alive as idle workers.

You can verify from inside a running pod:

import os
from collections import Counter

task_dir = '/proc/1/task'
tids = os.listdir(task_dir)
wchans = [open(f'{task_dir}/{tid}/wchan').read().strip() for tid in tids]

print(f'Total threads: {len(tids)}')
print(Counter(wchans).most_common())
# → [('futex_wait_queue_me', 20), ('ep_poll', 1), ...]

When Process(...).start() calls os.fork() from a 22-thread process, the child inherits all thread states but only the forking thread survives. If any thread was mid-import, the child sees the import lock permanently held → ModuleNotFoundError for modules that are physically on disk.

Are you seeing the DeprecationWarning: This process is multi-threaded, use of fork() may lead to deadlocks warning in your own deployments? On ours it fires on every task launch.


Proposal: hook onto core.execute_tasks_new_python_interpreter

Rather than a hard switch, Airflow already has core.execute_tasks_new_python_interpreter for exactly this tradeoff. What if the edge worker just honours that setting? Default stays False (fork) to keep existing behaviour — no change for users who don't opt in. Users who want the safer path flip it to True and get subprocess.Popen, same as the other executors.

Happy to update the PR to implement it that way if you're on board.

@potiuk potiuk marked this pull request as draft May 5, 2026 16:31
@potiuk
Copy link
Copy Markdown
Member

potiuk commented May 5, 2026

@diogosilva30 Converting to draft — this PR doesn't yet meet our Pull Request quality criteria.

  • Pre-commit / static checks — Failing: CI image checks / Static checks. See docs.
  • mypy (type checking) — Failing: MyPy providers checks. See docs.
  • Provider tests — Failing: provider distributions tests / Compat 3.0.6:P3.10, provider distributions tests / Compat 3.1.8:P3.10, provider distributions tests / Compat 3.2.1:P3.10. See docs.

See the linked criteria for how to fix each item, then mark the PR "Ready for review". This is not a rejection — just an invitation to bring the PR up to standard. No rush.


Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

@jscheffl
Copy link
Copy Markdown
Contributor

jscheffl commented May 5, 2026

@jscheffl it just happened again in our prod environment. After Sunday some tasks started randomly failing on edge worker.

The logs (anonymized):

[2026-05-04 11:08:08] INFO - Stats instance was created in PID 1 but accessed in PID 2408809. Re-initializing. source=airflow.stats loc=stats.py:57
[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/common/operators/monitoring/internal_active_sessions.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'common'
...
[2026-05-04 11:08:08] ERROR - Startup reschedule limit exceeded reschedule_count=3 max_reschedules=3 source=task loc=task_runner.py:631
[2026-05-04 11:08:09] WARNING - Process exited abnormally exit_code=1 source=task

Interesting, is it intended on your end to load plugins on the worker actually? Did not realize after adding anyio that this runs the IO calls just in a background thread. Thought this lib is just making async IO calls on OS level.

Proposal: hook onto core.execute_tasks_new_python_interpreter

Rather than a hard switch, Airflow already has core.execute_tasks_new_python_interpreter for exactly this tradeoff. What if the edge worker just honours that setting? Default stays False (fork) to keep existing behaviour — no change for users who don't opt in. Users who want the safer path flip it to True and get subprocess.Popen, same as the other executors.

I still wonder about the case and that it is not happening on our side and heard no reports previously and code is productive since 12+months. Anyway I'd accept a switlcih like with this flag as it is also in CeleryExecutor as an optional switch/flag. But the Exception serialization which was originally removed should stay as this is uploaded back to task logs such that a user can see more details of the root of the failure in task logs.

@diogosilva30
Copy link
Copy Markdown
Contributor Author

diogosilva30 commented May 6, 2026

@jscheffl yes, this is intentional. We ship DAG factories and reusable operators as modules inside the plugins/ folder so multiple DAGs can be instantiated from the same logic without duplication.

Pattern overview:

plugins/
└── common/
    └── operators/
        └── example_dag_factory.py   ← shared factory + tasks
dags/
└── prod/
    └── my_dag.py                    ← thin wrapper that calls the factory

plugins/common/operators/example_dag_factory.py (shared logic):

"""Reusable DAG factory for fetching and exporting metrics."""

from datetime import timedelta
from airflow.sdk import task


@task
def fetch_data(source: str) -> list[dict]:
    """Fetch records from a data source."""
    # ... implementation ...
    return []


@task
def export_metrics(data: list[dict], conn_id: str) -> None:
    """Export metrics via an external connection."""
    # ... implementation ...


def metrics_dag_definition(source: str, conn_id: str) -> None:
    """Wire up the DAG tasks."""
    data = fetch_data(source=source)
    export_metrics(data=data, conn_id=conn_id)


metrics_dag_kwargs = {
    "schedule": timedelta(minutes=5),
    "tags": ["metrics"],
}

dags/prod/my_dag.py (thin DAG file):

"""DAG for exporting prod metrics."""

import functools
from common import create_dag
from common.operators.example_dag_factory import metrics_dag_definition, metrics_dag_kwargs

create_dag(
    dag_name="prod_metrics",
    dag_definition=functools.partial(
        metrics_dag_definition,
        source="prod",
        conn_id="metrics_conn_prod",
    ),
    dag_file=__file__,
    **metrics_dag_kwargs,
)

The DAG file itself is essentially a one-liner, all the task logic lives in the shared plugin module.

Regarding the core.execute_tasks_new_python_interpreter proposal, I'll work on updating the PR to honour that flag rather than hard-switching.

@diogosilva30 diogosilva30 force-pushed the fix/edge3-fork-deadlock-subprocess branch 13 times, most recently from be9ce01 to 9987618 Compare May 6, 2026 10:09
… in multi-threaded workers

The edge worker process runs 22+ threads (asyncio event loop,
ThreadPoolExecutor, HTTP clients). When `_launch_job()` used
`multiprocessing.Process` (fork start method), `os.fork()` copied
locked import locks from other threads into the child. Since only the
forking thread survives, those locks are never released — causing
permanent deadlocks on any subsequent import in the child process.

A non-deadlock variant also occurs where the child inherits corrupted
`sys.modules` state, causing `ModuleNotFoundError` cascades for all
plugin and DAG imports.

This commit replaces the `multiprocessing.Process` fork with
`subprocess.Popen` launching a fresh Python interpreter via the
existing `airflow.sdk.execution_time.execute_workload` CLI entrypoint.
The `ExecuteTask` workload is already a Pydantic model with
`model_dump_json()` — the same serialization path used by the ECS
executor and the edge executor's own DB storage.

Changes:
- `worker.py`: Replace `_launch_job` to use `subprocess.Popen` with
  `execute_workload --json-string`. Remove `_run_job_via_supervisor`,
  `_reset_parent_signal_state`, `multiprocessing` imports, and the
  `results_queue` plumbing.
- `dataclasses.py`: Change `Job.process` type from
  `multiprocessing.Process` to `subprocess.Popen`. Update `is_running`
  to use `poll()` and `is_success` to check `returncode`.
- `test_worker.py`: Update mocks and assertions to match the new
  subprocess-based approach.

Fixes: apache#65942
@diogosilva30 diogosilva30 force-pushed the fix/edge3-fork-deadlock-subprocess branch from 9987618 to 27bb264 Compare May 6, 2026 10:12
@diogosilva30
Copy link
Copy Markdown
Contributor Author

There are two PRs open (in parallel) to address this -> #65847 + #63498

Should have checked this before. Rolled back this change and also addressed the previous CI providers-tests failures. Hopefully all green this time

@diogosilva30 diogosilva30 force-pushed the fix/edge3-fork-deadlock-subprocess branch 2 times, most recently from d169be1 to 132e2bb Compare May 11, 2026 22:30
@wjddn279
Copy link
Copy Markdown
Contributor

@diogosilva30
Interesting. I read through your analysis and it looks like a correct explanation.

I have a question — how often does this error occur? Does the user's code change frequently?

If I understand correctly, the issue is that among the multiple threads running in the edge worker, if a fork is performed while another thread (one not performing the fork) is in the middle of an import, it can cause problems in the import system. If that's the case, the problem would arise when a new module is being imported in another thread.

Even with lazy loading, since the edge worker follows a fixed footprint, I'm curious whether new module loading happens frequently. Since a module that has already been imported once should no longer be a source of the problem, I would expect the frequency to gradually decrease over time.

Applying the same approach that exists in Celery seems like a good idea. However, the trade-offs should be carefully understood. With airflow, simply loading the airflow module alone loads 100mb of libraries. The existing fork approach significantly reduces PSS through COW, but this approach causes memory to increase linearly with the number of concurrent executions. And slow loading is a bonus downside.

below is checking the PSS usage when just import airflow.sdk.execution_time.execute_workload in subprocess

=== A. subprocess.Popen (fresh interpreter, no sharing) === parent pid=99  RSS=118.4 MiB  PSS=98.3 MiB
     pid    RSS MiB    PSS MiB   Private MiB
     100      117.9       97.8          95.8
     101      117.9       97.8          95.8
     102      117.9       97.8          95.8

=== B. multiprocessing.Process (fork, COW with parent) === parent pid=99  RSS=118.5 MiB  PSS=30.3 MiB
     pid    RSS MiB    PSS MiB   Private MiB
     110       99.7       12.7           4.0
     111       99.7       12.7           4.0
     112       99.7       12.7           4.0

As jens mentioned, the problem is clear enough that it could have been reported by now, so it's also a bit curious that it hasn't been.

@wjddn279
Copy link
Copy Markdown
Contributor

@jscheffl

What do you think about changing the edge worker to handle task the same way as LocalExecutor? At initial startup, pre-create a persistent process (like fork pool) sized to concurrency and dispatch workloads through a queue. If we set up the pool before the asyncio loop starts, this issue shouldn't occur either.

@diogosilva30
Copy link
Copy Markdown
Contributor Author

@wjddn279

Hey, just checked our staging pods to answer this properly.

On one of our workers today (airflow-worker-6998fbdf9c-mpxrd, been up 23h), I pulled the logs and found three SIGABRT crashes:

2026-05-13T04:41:22.161535Z [info     ] Task finished                  [supervisor] duration=1.5012219889904372 exit_code=<Negsignal.SIGABRT: -6> final_state=failed loc=supervisor.py:2109 task_instance_id=019e1fa3-5c78-712e-bc70-c68310f34cd0
2026-05-13T04:57:57.514458Z [info     ] Task finished                  [supervisor] duration=1.675403744011419 exit_code=<Negsignal.SIGABRT: -6> final_state=failed loc=supervisor.py:2109 task_instance_id=019e1fb2-a889-7646-ac6f-897bcb8e214f
2026-05-13T06:17:03.839441Z [info     ] Task finished                  [supervisor] duration=1.4628011609893292 exit_code=<Negsignal.SIGABRT: -6> final_state=failed loc=supervisor.py:2109 task_instance_id=019e1ff9-6c08-7b8b-8c66-e4a6e45bcfc2

Under 2 seconds probably means the child never actually ran. It deadlocked on an import lock right at startup and got aborted.

On the "gradually decreases" question, I see where you're coming from, but the problem here isn't plugin imports in user code. The warning fires on every single fork throughout the worker's lifetime because supervisor.py itself keeps a thread pool alive for async log pushing (aiofiles/anyio). Add GCS credential refreshes and Secret Manager calls on every task completion and you've got live threads on basically every fork. The race doesn't go away after warmup.

On memory, fair point, the numbers are real. That's exactly why we made it opt-in rather than changing the default. People who aren't hitting this keep the existing behaviour.

@diogosilva30
Copy link
Copy Markdown
Contributor Author

@jscheffl

What do you think about changing the edge worker to handle task the same way as LocalExecutor? At initial startup, pre-create a persistent process (like fork pool) sized to concurrency and dispatch workloads through a queue. If we set up the pool before the asyncio loop starts, this issue shouldn't occur either.

@wjddn279 yeah, a pre-forked pool would sidestep the race (fork before asyncio means single-threaded children), but I'm not sure it's the right shape for the edge worker.

My hesitation with a persistent pool:

  • Leaks, stale connections, open fds all accumulate across task runs. Right now every task gets a clean process and that isolation earns its keep.
  • If a pool worker dies mid-task, you have to detect it, restart it, and reconcile with the Edge API. Per-task processes just... handle that.
  • Plugin hot reload via git-sync breaks until a worker recycles since long-lived workers hold onto their imported code.
  • It touches the core execution model (dispatcher, queueing, lifecycle, supervision). Way bigger blast radius than this PR warrants.

What about multiprocessing.forkserver?

stdlib already ships the right primitive here. One forkserver process is started at startup, before asyncio, stays single-threaded for the worker's lifetime, and is the only thing that ever calls fork(). Tasks are still fresh COW children, just cloned from a clean forkserver instead of from the multi-threaded asyncio main.

T0  main starts
    mp.set_forkserver_preload([...])
    ctx = mp.get_context("forkserver")
 
T1  main forks ONCE (still single-threaded)
     |                                       |
     v                                       v
    main process                       forkserver process
    starts asyncio                     imports preload modules
    becomes multi-threaded             stays single-threaded forever
 
    on each task:
      ctx.Process(target=supervise).start()
      -> forkserver does fork() (safe, no threads)
      -> child execs supervise()

Call-site diff is basically nothing:

# before
p = Process(target=supervise, args=(job, child_conn))
p.start()  # fork() from multi-threaded asyncio
 
# after
p = ctx.Process(target=supervise, args=(job, child_conn))
p.start()  # fork() happens inside the forkserver

The real constraint is ordering: the forkserver context has to exist before any asyncio import or thread start. That's a small CLI entry point change, not a redesign.

Memory at 5 concurrent tasks:

Current fork Subprocess (this PR) Forkserver
Per-task overhead ~13 MiB COW ~98 MiB ~15 MiB COW
Fixed overhead 0 0 ~150 MiB
Fork safety unsafe safe safe
Total for 5 tasks ~65 MiB ~490 MiB ~225 MiB

Near-COW memory cost, same safety as subprocess, without the per-task RSS hit you called out.

One caveat: forkserver is POSIX-only. Windows only supports spawn. That said, the current os.fork() path doesn't work on Windows either so it's not a regression. A Windows follow-up would just fall back to spawn (same profile as this PR).

Potential order:

  1. Merge this as-is. Small, opt-in, seems to fix the deadlocks issue I'm seeing.
  2. Follow-up: design and discuss forkserver behind execute_tasks_via_forkserver=True, same opt-in model.
  3. Once it's had some bake time, flip the default in a later 3.x and deprecate the direct fork.

Happy to draft that follow-up if there's appetite for it. @jscheffl curious what you think.

@wjddn279
Copy link
Copy Markdown
Contributor

@diogosilva30
Thanks for reply!

On the "gradually decreases" question, I see where you're coming from, but the problem here isn't plugin imports in user code. The warning fires on every single fork throughout the worker's lifetime because supervisor.py itself keeps a thread pool alive for async log pushing (aiofiles/anyio). Add GCS credential refreshes and Secret Manager calls on every task completion and you've got live threads on basically every fork. The race doesn't go away after warmup.

Fair point. I was only thinking about the import-lock case, but it sounds like the locking issues span much more broadly across threads than just imports.

What about multiprocessing.forkserver?

After looking into how forkserver actually works, it looks like a better fit than a fork pool for a multi-threaded environment like the edge worker. There is no need to write many code to apply it.

Spinning up the server before the async cycle starts and forking from there seems like the right shape.

Leaks, stale connections, open fds all accumulate across task runs. Right now every task gets a clean process and that isolation earns its keep.

The forked process doesn't do anything beyond running supervise_task. Any connections or fds it creates are explicitly closed. (There were a few leaks in practice, but I've fixed most of them by now.)

Plugin hot reload via git-sync breaks until a worker recycles since long-lived workers hold onto their imported code.

Same here — by design the worker doesn't import user code directly. All user code is loaded only inside the short-lived process forked by supervise_task.

It touches the core execution model (dispatcher, queueing, lifecycle, supervision). Way bigger blast radius than this PR warrants.

Agreed. There's reference code we could lean on, but the diff is still substantial — you'd need a Queue for IPC and liveness checks layered on top.

If a pool worker dies mid-task, you have to detect it, restart it, and reconcile with the Edge API. Per-task processes just... handle that.

This was actually my biggest concern with the pool approach. If a worker dies you have to restart it, but there's no guarantee the restarted process is lock-safe either. That's exactly where forkserver looks like the strong option

@jscheffl
Copy link
Copy Markdown
Contributor

@wjddn279 @diogosilva30 Thanks for all valid discussions. Really cool. So improvements in general welcome.

@wjddn279 The optimization that was added to LocalExecutor is still on my (long) bucket list. I even left a comment in https://github.com/apache/airflow/blob/main/providers/edge3/src/airflow/providers/edge3/cli/worker.py#L460 to remind me whenever I see the code again. Feel free to raise a PR (to be faster than me)!

In general I am mostly wondering why at @diogosilva30 this error is happeing often while in our environment we do not see this at all. Must be some environmental side effect. So for the moment I'd accept it as an option but as of memory overhead and time only optional. Main line should be kept as is until more errors reported.

I am not sure if for many environments a pre-fork makes sense. It might be another (tuning) option. I assume the typical pattern for celery is different than for Edge... at least I think. I#d in general favor simplicity over complexity. And a pre-fork pool might add a level of complexity making it harder to maintain or adding bugs.

My actual "dream" would be that we could make supervisor bing mostly async such that we do not need to fork() or spawn another process at all just to run another process... but just have a set of in-process supervisor instances which themself just fork the execution for the task. Because today we fork worker -> supervisor -> task execution.

Comment thread providers/edge3/src/airflow/providers/edge3/cli/dataclasses.py Outdated
Comment thread providers/edge3/src/airflow/providers/edge3/cli/dataclasses.py Outdated
Comment thread providers/edge3/src/airflow/providers/edge3/cli/worker.py Outdated
Comment thread providers/edge3/src/airflow/providers/edge3/cli/worker.py Outdated
Copy link
Copy Markdown
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks 99,5% good already! So after the discussion I feel really that this is an improvment. If the (few) comments adjusted the I am good to merge.

And Forkserver might be another PR...

Remove the multiprocessing.Queue from the fork execution path and use a
plain temp file for both fork and subprocess paths. Both paths now write
failure text to a NamedTemporaryFile (Path stored as Job.stderr_file_path);
the parent reads it after the child exits via Job.failure_details() and
pushes the content to the task log via logs_push.

Benefits over the Queue approach:
- No risk of buffer deadlock (the Queue deadlock was the original issue)
- Works identically for both fork and subprocess children
- Simpler: no IPC setup, no draining loop, no Queue import
- Error file is only created/filled on failure; task logs cover the
  success path

Also extract _make_task_temp_file() helper to avoid duplicating the
NamedTemporaryFile creation pattern across _launch_job_subprocess and
_launch_job_fork.
@diogosilva30 diogosilva30 force-pushed the fix/edge3-fork-deadlock-subprocess branch from 132e2bb to 9de4110 Compare May 14, 2026 10:45
@diogosilva30
Copy link
Copy Markdown
Contributor Author

@jscheffl applied all your suggestions!

Copy link
Copy Markdown
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@diogosilva30 diogosilva30 force-pushed the fix/edge3-fork-deadlock-subprocess branch from 7a9fb22 to c11a8ac Compare May 14, 2026 13:59
@diogosilva30
Copy link
Copy Markdown
Contributor Author

@jscheffl saw the test CI failures on compat tests. Should be good now ✅

Copy link
Copy Markdown
Contributor

@wjddn279 wjddn279 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for good investigation and suggestion! LGTM
I agree to apply forkserver needs to more ref of this issue and discuss!

@jscheffl jscheffl merged commit 9b1d58a into apache:main May 14, 2026
143 checks passed
@boring-cyborg
Copy link
Copy Markdown

boring-cyborg Bot commented May 14, 2026

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@diogosilva30
Copy link
Copy Markdown
Contributor Author

Thank you @jscheffl and @wjddn279 for discussion and guidance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:edge Edge Executor / Worker (AIP-69) / edge3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Edge worker _launch_job corrupts import state on Python 3.12 — fork() in multi-threaded process inherits stale import locks

4 participants