Add multi-communication-domain support#752
Open
uv-xiao wants to merge 9 commits into
Open
Conversation
- Add a single doc covering current single-domain wiring from L3 to PTO-ISA kernels - Specify the multi-domain plan, per-chip derivation, and sub-communicator bootstrap path - Keep the change doc-only for this PR
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
- Add CommDomainPlan-derived per-chip bootstrap configs and domain-aware contexts - Derive kernel-visible communication contexts from one hidden base window - Extend sim and hardware bootstrap paths plus focused unit coverage
- Refit communication examples to CommDomainPlan and ctx.domains access - Keep explicit bootstrap configs for host-staging examples - Add tiny rank-map and two-domain overlap L3 examples
- Record implemented and intentionally omitted features - Document bootstrap flow, host-staging split, and sim scope - Add validation status for migrated and new examples
- Run one PTO-ISA allreduce per domain in domain_rank_map - Check transferred data for both overlapping domains - Update docs to describe hardware communication coverage
- Remove unsafe HCCL barrier before MC2 resource allocation - Link host runtime against the resolved hcomm library path - Add SDMA workspace preflight and write PR 752 handoff notes
- Allow migrated L3 communication demos to run on a2a3sim - Keep hardware text-section extraction for onboard runs only - Preserve ffn allreduce kernel logic while fixing sim include order
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CommDomain,CommDomainPlan,ChipDomainBootstrapConfig, anddomain-aware
ChipContext.domains[name].CommDomainPlan, withdense domain ranks defined by
CommDomain.worker_indicesorder.domain-plan path; single-domain usage is now expressed as one
CommDomain("default", ...).ChipBootstrapConfigsupport for per-chip data such as hoststaging, keyed by
(domain_name, buffer_name).carve symmetric per-domain buffers, and pass domain-local
CommContext*pointers to PTO-ISA kernels.
plus derived-domain-context model.
focused multi-domain examples:
workers/l3/domain_rank_mapandworkers/l3/dual_domain_overlap.docs/.Backend Notes
communication window. Kernels communicate with domain-local ranks through
the derived domain
CommContext.derives host-resident domain
CommContextobjects by remappingwindowsIn[]/windowsOut[]through the domain rank list and window offset.HCCL/HCOMM resources for windows and PTO-ISA RMA-style communication in
kernels.
Validation
bootstrap channels, sim bootstrap, worker-level sim orchestration, and error
cleanup.
sdma_async_completion_demofailure.workers/l3/domain_rank_mappasses on hardware with CANN 8.5 on devices12,13,14.python -m py_compilepasses for the updated SDMA demo.markdownlint-cli2andgit diff --checkpass for the touched docs.Known Limitation
a2a3/sdma_async_completion_demostill fails and is tracked as a remainingSDMA/HCCL resource setup issue, not as a multi-domain bootstrap failure.
aclnnShmemSdmaStarsQuerywhen SDMA workspace support isenabled.
HcclAllocComResourceByTilingstill returns 15 on tested card pairs andalso affects non-SDMA
domain_rank_map.origin/main, so it isnot yet proven to be introduced by this PR.