Skip to content

[PAL] Support Device API Transport#445

Open
MC952-arch wants to merge 2 commits intoflagos-ai:mainfrom
MC952-arch:device-api-transport
Open

[PAL] Support Device API Transport#445
MC952-arch wants to merge 2 commits intoflagos-ai:mainfrom
MC952-arch:device-api-transport

Conversation

@MC952-arch
Copy link
Copy Markdown
Collaborator

@MC952-arch MC952-arch commented Apr 3, 2026

PR Category

PAL

PR Types

New Features

PR Description

This PR updates the device-side abstraction layer to use a unified Device Transport interface (replacing flagcxDevNet) and aligns barrier/session tagging accordingly, enabling transport-backed load/store and one-sided operations across vendor and fallback backends.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the device-side abstraction layer to use a unified Device Transport interface (replacing flagcxDevNet) and aligns barrier/session tagging accordingly, enabling transport-backed load/store and one-sided operations across vendor and fallback backends.

Changes:

  • Replaced device-side flagcxDevNet usage with flagcxDevTransport across device API kernels and wrappers.
  • Unified barrier tagging by switching from the old barrier-tag types to flagcxTeamTag{Intra,Inter,World}.
  • Added/propagated Transport support in comm traits (vendor + fallback) and ensured flagcxDevMem preserves an explicit raw pointer.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
flagcx/kernels/device_api.cu Migrates kernels from DevNet to DevTransport (put/signal/wait/flush + intra load/store).
flagcx/kernels/custom_allreduce.cu Adds device_utils.h include to ensure device macro/type availability.
flagcx/adaptor/include/device_api/nvidia_comm_traits.h Renames/introduces vendor Transport, updates barrier specializations, and adjusts window/coop definitions.
flagcx/adaptor/include/device_api/flagcx_device.h Replaces flagcxDevNet wrapper with flagcxDevTransport; updates barrier wrappers and tag usage; stores flagcxDevMem raw ptr explicitly.
flagcx/adaptor/include/device_api/fallback_comm_traits.h Renames fallback Net to Transport and updates action/signal/counter types accordingly.
flagcx/adaptor/include/device_api/comm_traits.h Renames one-sided action types to flagcxDevTransport_* and replaces barrier tags with flagcxTeamTag*.
flagcx/adaptor/flagcx_device.cc Initializes nInterPeers for vendor devComm path to enable multi-node transport/barrier behavior in kernels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@MC952-arch MC952-arch force-pushed the device-api-transport branch 3 times, most recently from e6f2775 to 77240ce Compare April 3, 2026 07:36
@MC952-arch MC952-arch force-pushed the device-api-transport branch from 77240ce to 22497ed Compare April 3, 2026 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants