Skip to content

Allow using fresh interpreter besides fork() in Edge Worker#65943

Open
diogosilva30 wants to merge 13 commits intoapache:mainfrom
diogosilva30:fix/edge3-fork-deadlock-subprocess
Open

Allow using fresh interpreter besides fork() in Edge Worker#65943
diogosilva30 wants to merge 13 commits intoapache:mainfrom
diogosilva30:fix/edge3-fork-deadlock-subprocess

Conversation

@diogosilva30
Copy link
Copy Markdown

@diogosilva30 diogosilva30 commented Apr 27, 2026

What

Update the Edge worker task launch path to honor Airflow's existing [core] execute_tasks_new_python_interpreter option.

By default, Edge workers keep the existing fork-based behavior. When execute_tasks_new_python_interpreter=True is configured, or when os.fork is unavailable, the worker launches the task in a fresh Python interpreter with subprocess.Popen and the existing airflow.sdk.execution_time.execute_workload entrypoint.

Fixes #65942

Why

Some Edge worker deployments can run task execution from a multi-threaded worker process. Forking a process after threads have started can inherit unsafe parent state, including import locks and partially initialized modules. In affected deployments this can show up as intermittent task-start failures, plugin import errors, or startup-reschedule exhaustion.

Airflow already has a core setting for this tradeoff: execute_tasks_new_python_interpreter. Other execution paths can use it to choose a fresh interpreter instead of fork. This PR applies the same behavior to the Edge worker without changing the default for existing deployments.

How

The change keeps both launch modes:

Mode When used Behavior
Fork Default when os.fork exists and execute_tasks_new_python_interpreter=False Uses multiprocessing.Process and the existing supervisor helper
Fresh interpreter execute_tasks_new_python_interpreter=True or no os.fork Uses subprocess.Popen with python -m airflow.sdk.execution_time.execute_workload --json-string ...

The fresh-interpreter path also spools stderr to a temporary file instead of stderr=PIPE. The worker only needs stderr after the subprocess exits, and a pipe can deadlock if the child writes enough data before the parent reads it. Spooling to a file avoids that while still preserving root failure details in task logs.

Changes

File What changed
providers/edge3/src/airflow/providers/edge3/cli/worker.py Route task launch through fork or fresh interpreter based on self.conf.getboolean("core", "execute_tasks_new_python_interpreter"); track subprocess stderr temp files by PID; upload subprocess stderr details on failure; preserve fork result-queue handling
providers/edge3/tests/unit/edge3/cli/test_worker.py Add coverage for launch-mode routing, subprocess command construction, stderr spooling, and failed subprocess log upload

Notes

  • Fork remains the default behavior.
  • The fork path still drains the multiprocessing result queue before waiting for process exit, preserving the previous deadlock protection for large exception payloads.
  • The subprocess path cannot return a Python exception object to the parent process, so the uploaded failure detail is based on exit code plus stderr content.
  • The change uses self.conf, not the global config object, so team-aware Edge worker configuration is respected.

Testing

  • uv run ruff format providers/edge3/src/airflow/providers/edge3/cli/worker.py providers/edge3/tests/unit/edge3/cli/test_worker.py
  • uv run ruff check --fix providers/edge3/src/airflow/providers/edge3/cli/worker.py providers/edge3/tests/unit/edge3/cli/test_worker.py
  • uv run --project providers/edge3 pytest providers/edge3/tests/unit/edge3/cli/test_worker.py -xvs (68 passed)
  • uv run prek run --files providers/edge3/src/airflow/providers/edge3/cli/worker.py providers/edge3/tests/unit/edge3/cli/test_worker.py
  • uv run --project providers/edge3 mypy providers/edge3/src/airflow/providers/edge3/cli/worker.py providers/edge3/tests/unit/edge3/cli/test_worker.py

Was generative AI tooling used to co-author this PR?
  • Yes — GitHub Copilot and Claude Opus 4.6

Generated-by: GitHub Copilot following the guidelines

@boring-cyborg boring-cyborg Bot added area:providers provider:edge Edge Executor / Worker (AIP-69) / edge3 labels Apr 27, 2026
@boring-cyborg
Copy link
Copy Markdown

boring-cyborg Bot commented Apr 27, 2026

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@jscheffl
Copy link
Copy Markdown
Contributor

That bug report sounds interesting and we run Edge Worker since more than a year in production - with more than one job in concurrency. Never had any of the reported problems so wondering why it hits in your environment.

os.fork() is also being used in CeleryExecutor and LocalExecutor which are the other main work horses in Airflow since years.

I need to admit I am not a Unix/Signalling/fork() expert but a bit courious how this problem appears in your env. The implementation in EdgeWorker was also "just" inherited by Celery and LocalExecutor.

@jscheffl
Copy link
Copy Markdown
Contributor

I thought a moment (but not final) about the PR. What wonders me a bit that you say there are 22 threads being started - Edge workers uses AsyncIO with tasks living in one thread in an event loop. There might be a background thread being started by plugins in the environment but wondering how you get to 22. Can you have more information on this? I would have expected 1 thread.

Nevertheless the process spawn penalty has been seen much larger in my environments. So I#d not really favor in fully switching. Would like much rather a configuration option to define how to run a separate process.

@diogosilva30
Copy link
Copy Markdown
Author

@jscheffl it just happened again in our prod environment. After Sunday some tasks started randomly failing on edge worker.

The logs (anonymized):

[2026-05-04 11:08:08] INFO - Stats instance was created in PID 1 but accessed in PID 2408809. Re-initializing. source=airflow.stats loc=stats.py:57
[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/common/operators/monitoring/internal_active_sessions.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'common'
...
[2026-05-04 11:08:08] ERROR - Startup reschedule limit exceeded reschedule_count=3 max_reschedules=3 source=task loc=task_runner.py:631
[2026-05-04 11:08:09] WARNING - Process exited abnormally exit_code=1 source=task

Both common and monitoring are modules in AIRFLOW__CORE__PLUGINS_FOLDER (/opt/airflow/dags/repo/plugins). They're present on disk — git-sync keeps them up to date. The ModuleNotFoundError isn't a missing file problem, it's the forked child process seeing a corrupted import state inherited from the parent.

Full anonymized logs
[2026-05-04 11:08:08] INFO - Stats instance was created in PID 1 but accessed in PID 2408809. Re-initializing. source=airflow.stats loc=stats.py:57
[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/common/operators/monitoring/internal_active_sessions.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'common'
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/plugins_manager.py", line 291 in load_plugins_from_plugin_directory
File "<frozen importlib._bootstrap_external>", line 999 in exec_module
File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
File "/opt/airflow/dags/repo/plugins/common/operators/monitoring/internal_active_sessions.py", line 13 in <module>

[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/common/operators/monitoring/internal_agent_states.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'common'
File "/opt/airflow/dags/repo/plugins/common/operators/monitoring/internal_agent_states.py", line 11 in <module>

[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/common/operators/monitoring/internal_call_data.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'common'
File "/opt/airflow/dags/repo/plugins/common/operators/monitoring/internal_call_data.py", line 9 in <module>

[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/common/operators/monitoring/app_logged_in_users.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'monitoring'
File "/opt/airflow/dags/repo/plugins/common/operators/monitoring/app_logged_in_users.py", line 13 in <module>

[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/common/operators/monitoring/app_user_stats.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'common'
File "/opt/airflow/dags/repo/plugins/common/operators/monitoring/app_user_stats.py", line 14 in <module>

[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/common/operators/monitoring/tenant_info.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'monitoring'
File "/opt/airflow/dags/repo/plugins/common/operators/monitoring/tenant_info.py", line 8 in <module>

[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/common/operators/monitoring/internal_api_stats.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'common'
File "/opt/airflow/dags/repo/plugins/common/operators/monitoring/internal_api_stats.py", line 8 in <module>

[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/common/operators/monitoring/vendor_data.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'common'
File "/opt/airflow/dags/repo/plugins/common/operators/monitoring/vendor_data.py", line 8 in <module>

[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/monitoring/cache.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'monitoring'
File "/opt/airflow/dags/repo/plugins/monitoring/cache.py", line 18 in <module>

[2026-05-04 11:08:08] ERROR - Failed to import: /opt/airflow/dags/repo/dags/stg/monitoring/region/test_dag.py source=airflow.models.dagbag.BundleDagBag loc=dagbag.py:415
ModuleNotFoundError: No module named 'common'

[2026-05-04 11:08:08] ERROR - Dag not found during start up dag_id=stg__monitoring__region__test_staging_dag bundle=BundleInfo(name='dags-folder', version=None)
[2026-05-04 11:08:08] ERROR - Startup reschedule limit exceeded reschedule_count=3 max_reschedules=3
[2026-05-04 11:08:09] WARNING - Process exited abnormally exit_code=1
Pod spec (anonymized)
spec:
  containers:
    - name: edge-worker
      image: internal-registry.example.net/monitoring/airflow:3.1.8
      args: [edge, worker, '-q', stg__general__region]
      env:
        - name: AIRFLOW__CORE__PLUGINS_FOLDER
          value: /opt/airflow/dags/repo/plugins
        - name: AIRFLOW__EDGE__API_URL
          value: https://airflow.stg.monitoring.example.com/edge_worker/v1/rpcapi
      volumeMounts:
        - mountPath: /opt/airflow/dags
          name: dags
    - name: git-sync
      image: internal-registry.example.net/monitoring/git-sync:4.3.0
      args: [--repo=..., --root=/dags, --link=repo, --period=60s]
      volumeMounts:
        - mountPath: /dags
          name: dags
  volumes:
    - name: dags
      emptyDir: {}

This connects directly to your question about the thread count. The edge worker runs asyncio.run(edge_worker.start()) and by the time _launch_job() fires it's already running 22+ OS threads:

  1. The asyncio event loop main thread
  2. asyncio's default ThreadPoolExecutor — created lazily the first time anyio/aiofiles calls loop.run_in_executor(None, ...) in _push_logs_in_chunks. Python's default pool size is min(32, os.cpu_count() + 4) — on our 16-core nodes that's 20 threads, all kept alive as idle workers.

You can verify from inside a running pod:

import os
from collections import Counter

task_dir = '/proc/1/task'
tids = os.listdir(task_dir)
wchans = [open(f'{task_dir}/{tid}/wchan').read().strip() for tid in tids]

print(f'Total threads: {len(tids)}')
print(Counter(wchans).most_common())
# → [('futex_wait_queue_me', 20), ('ep_poll', 1), ...]

When Process(...).start() calls os.fork() from a 22-thread process, the child inherits all thread states but only the forking thread survives. If any thread was mid-import, the child sees the import lock permanently held → ModuleNotFoundError for modules that are physically on disk.

Are you seeing the DeprecationWarning: This process is multi-threaded, use of fork() may lead to deadlocks warning in your own deployments? On ours it fires on every task launch.


Proposal: hook onto core.execute_tasks_new_python_interpreter

Rather than a hard switch, Airflow already has core.execute_tasks_new_python_interpreter for exactly this tradeoff. What if the edge worker just honours that setting? Default stays False (fork) to keep existing behaviour — no change for users who don't opt in. Users who want the safer path flip it to True and get subprocess.Popen, same as the other executors.

Happy to update the PR to implement it that way if you're on board.

@potiuk potiuk marked this pull request as draft May 5, 2026 16:31
@potiuk
Copy link
Copy Markdown
Member

potiuk commented May 5, 2026

@diogosilva30 Converting to draft — this PR doesn't yet meet our Pull Request quality criteria.

  • Pre-commit / static checks — Failing: CI image checks / Static checks. See docs.
  • mypy (type checking) — Failing: MyPy providers checks. See docs.
  • Provider tests — Failing: provider distributions tests / Compat 3.0.6:P3.10, provider distributions tests / Compat 3.1.8:P3.10, provider distributions tests / Compat 3.2.1:P3.10. See docs.

See the linked criteria for how to fix each item, then mark the PR "Ready for review". This is not a rejection — just an invitation to bring the PR up to standard. No rush.


Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

@jscheffl
Copy link
Copy Markdown
Contributor

jscheffl commented May 5, 2026

@jscheffl it just happened again in our prod environment. After Sunday some tasks started randomly failing on edge worker.

The logs (anonymized):

[2026-05-04 11:08:08] INFO - Stats instance was created in PID 1 but accessed in PID 2408809. Re-initializing. source=airflow.stats loc=stats.py:57
[2026-05-04 11:08:08] ERROR - Failed to import plugin /opt/airflow/dags/repo/plugins/common/operators/monitoring/internal_active_sessions.py source=airflow.plugins_manager loc=plugins_manager.py:298
ModuleNotFoundError: No module named 'common'
...
[2026-05-04 11:08:08] ERROR - Startup reschedule limit exceeded reschedule_count=3 max_reschedules=3 source=task loc=task_runner.py:631
[2026-05-04 11:08:09] WARNING - Process exited abnormally exit_code=1 source=task

Interesting, is it intended on your end to load plugins on the worker actually? Did not realize after adding anyio that this runs the IO calls just in a background thread. Thought this lib is just making async IO calls on OS level.

Proposal: hook onto core.execute_tasks_new_python_interpreter

Rather than a hard switch, Airflow already has core.execute_tasks_new_python_interpreter for exactly this tradeoff. What if the edge worker just honours that setting? Default stays False (fork) to keep existing behaviour — no change for users who don't opt in. Users who want the safer path flip it to True and get subprocess.Popen, same as the other executors.

I still wonder about the case and that it is not happening on our side and heard no reports previously and code is productive since 12+months. Anyway I'd accept a switlcih like with this flag as it is also in CeleryExecutor as an optional switch/flag. But the Exception serialization which was originally removed should stay as this is uploaded back to task logs such that a user can see more details of the root of the failure in task logs.

@diogosilva30
Copy link
Copy Markdown
Author

diogosilva30 commented May 6, 2026

@jscheffl yes, this is intentional. We ship DAG factories and reusable operators as modules inside the plugins/ folder so multiple DAGs can be instantiated from the same logic without duplication.

Pattern overview:

plugins/
└── common/
    └── operators/
        └── example_dag_factory.py   ← shared factory + tasks
dags/
└── prod/
    └── my_dag.py                    ← thin wrapper that calls the factory

plugins/common/operators/example_dag_factory.py (shared logic):

"""Reusable DAG factory for fetching and exporting metrics."""

from datetime import timedelta
from airflow.sdk import task


@task
def fetch_data(source: str) -> list[dict]:
    """Fetch records from a data source."""
    # ... implementation ...
    return []


@task
def export_metrics(data: list[dict], conn_id: str) -> None:
    """Export metrics via an external connection."""
    # ... implementation ...


def metrics_dag_definition(source: str, conn_id: str) -> None:
    """Wire up the DAG tasks."""
    data = fetch_data(source=source)
    export_metrics(data=data, conn_id=conn_id)


metrics_dag_kwargs = {
    "schedule": timedelta(minutes=5),
    "tags": ["metrics"],
}

dags/prod/my_dag.py (thin DAG file):

"""DAG for exporting prod metrics."""

import functools
from common import create_dag
from common.operators.example_dag_factory import metrics_dag_definition, metrics_dag_kwargs

create_dag(
    dag_name="prod_metrics",
    dag_definition=functools.partial(
        metrics_dag_definition,
        source="prod",
        conn_id="metrics_conn_prod",
    ),
    dag_file=__file__,
    **metrics_dag_kwargs,
)

The DAG file itself is essentially a one-liner, all the task logic lives in the shared plugin module.

Regarding the core.execute_tasks_new_python_interpreter proposal, I'll work on updating the PR to honour that flag rather than hard-switching.

@diogosilva30 diogosilva30 force-pushed the fix/edge3-fork-deadlock-subprocess branch 13 times, most recently from be9ce01 to 9987618 Compare May 6, 2026 10:09
… in multi-threaded workers

The edge worker process runs 22+ threads (asyncio event loop,
ThreadPoolExecutor, HTTP clients). When `_launch_job()` used
`multiprocessing.Process` (fork start method), `os.fork()` copied
locked import locks from other threads into the child. Since only the
forking thread survives, those locks are never released — causing
permanent deadlocks on any subsequent import in the child process.

A non-deadlock variant also occurs where the child inherits corrupted
`sys.modules` state, causing `ModuleNotFoundError` cascades for all
plugin and DAG imports.

This commit replaces the `multiprocessing.Process` fork with
`subprocess.Popen` launching a fresh Python interpreter via the
existing `airflow.sdk.execution_time.execute_workload` CLI entrypoint.
The `ExecuteTask` workload is already a Pydantic model with
`model_dump_json()` — the same serialization path used by the ECS
executor and the edge executor's own DB storage.

Changes:
- `worker.py`: Replace `_launch_job` to use `subprocess.Popen` with
  `execute_workload --json-string`. Remove `_run_job_via_supervisor`,
  `_reset_parent_signal_state`, `multiprocessing` imports, and the
  `results_queue` plumbing.
- `dataclasses.py`: Change `Job.process` type from
  `multiprocessing.Process` to `subprocess.Popen`. Update `is_running`
  to use `poll()` and `is_success` to check `returncode`.
- `test_worker.py`: Update mocks and assertions to match the new
  subprocess-based approach.

Fixes: apache#65942
@diogosilva30 diogosilva30 force-pushed the fix/edge3-fork-deadlock-subprocess branch from 9987618 to 27bb264 Compare May 6, 2026 10:12
@diogosilva30 diogosilva30 marked this pull request as ready for review May 6, 2026 12:59
@diogosilva30 diogosilva30 changed the title fix(edge3): replace fork() with subprocess.Popen to prevent deadlocks in multi-threaded workers fix(edge3): honor fresh interpreter config in worker May 6, 2026
@jscheffl jscheffl changed the title fix(edge3): honor fresh interpreter config in worker Allow using fresh interpreter besides fork() in Edge Worker May 6, 2026
@jscheffl
Copy link
Copy Markdown
Contributor

jscheffl commented May 6, 2026

Note: Changed title as we are not using semantic commits.

Comment thread providers/edge3/src/airflow/providers/edge3/cli/worker.py Outdated
],
env=env,
start_new_session=True,
stderr=stderr_file,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not redirecting stderr to the normal logger/stdout?

Copy link
Copy Markdown
Author

@diogosilva30 diogosilva30 May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Used a temp file because stderr is the only parent-visible diagnostic channel for the fresh-interpreter path, and we want those diagnostics attached to the task that failed.

In the fork path, the child can return an exception object through the multiprocessing result queue. In the subprocess path, the child is a separate Python interpreter running execute_workload, so it cannot send that Python exception object back to the Edge worker. If something fails early, especially during workload parsing, supervisor startup, plugin import, or Dag import, stderr is what preserves the traceback.

We could pass sys.__stderr__ like Celery does, but then output from all concurrently running task subprocesses would share the Edge worker’s stderr. That means a traceback could end up only in the worker/container log, potentially interleaved with other task subprocesses and worker logs, and not attached to the failed task’s log.

The temp file is a per-task spool: it avoids subprocess.PIPE (which can deadlock if the parent does not continuously drain it), keeps stderr attributable to the specific task subprocess, and lets us push those startup diagnostics into the task log via logs_push after the process exits.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sounds reasonable.

Still the STDERR then could be sent to the message queue? Just as plain text? The Edge Worker checks for Exception but otherwise should also be able to accept Test as String (like "OK" is being sent?)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Queue approach works in the fork path because the child inherits the multiprocessing state, including the Queue itself.

With subprocess.Popen(...) we start a completely fresh Python interpreter, so there is no shared Queue unless we build a separate IPC layer (pipe/socket/fd passing/etc).

We could do that, but it adds quite a bit of complexity compared to the current tempfile approach. The tempfile also avoids PIPE deadlocks and still captures early bootstrap/import failures before any IPC channel would be initialized.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a sleep over and seeing the code... I do not actually want to insist on the queue :-D I just mainly want to pass error details back from supervisor if somethings failed into task logs. So the "text" content should be passed-over.

For me it would also be okay to step away from the Queue in general and transport the error details via a text file in both branches. Then we have one technical backend for both execution options. Main part I want to achieve is to have "text" transferred to instead of passing the exception to queue the test can also be written to file and picked-up. That would make it leaner?
(Including if all is OK we do not need to pass "OK" text, we just use the file for passing any error text?)

Comment thread providers/edge3/src/airflow/providers/edge3/cli/worker.py Outdated
@diogosilva30
Copy link
Copy Markdown
Author

diogosilva30 commented May 7, 2026

@jscheffl made the requested changes to move some branching logic to Job dataclass, feel free to review again.

Also noticed that in fork path we are using deprecated
airflow.sdk.execution_time.supervisor.supervise which can be cleanly migrated to airflow.sdk.execution_time.supervisor.supervise_task. Made that small change, hope it's okay :)

@jscheffl
Copy link
Copy Markdown
Contributor

jscheffl commented May 7, 2026

Also noticed that in fork path we are using deprecated airflow.sdk.execution_time.supervisor.supervise which can be cleanly migrated to airflow.sdk.execution_time.supervisor.supervise_task. Made that small change, hope it's okay :)

There are two PRs open (in parallel) to address this -> #65847 + #63498

@diogosilva30
Copy link
Copy Markdown
Author

There are two PRs open (in parallel) to address this -> #65847 + #63498

Should have checked this before. Rolled back this change and also addressed the previous CI providers-tests failures. Hopefully all green this time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:edge Edge Executor / Worker (AIP-69) / edge3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Edge worker _launch_job corrupts import state on Python 3.12 — fork() in multi-threaded process inherits stale import locks

3 participants