Skip to content

Minimum demo to highlight cross-core sync API differences#158

Open
learning-chip wants to merge 12 commits into
huawei-csl:mainfrom
learning-chip:cv_comm_demo
Open

Minimum demo to highlight cross-core sync API differences#158
learning-chip wants to merge 12 commits into
huawei-csl:mainfrom
learning-chip:cv_comm_demo

Conversation

@learning-chip
Copy link
Copy Markdown
Collaborator

@learning-chip learning-chip commented May 11, 2026

The C++ part of huawei-csl/pto-dsl#135.

Here demo two cases:

  • Just streaming data along Cube(L0C)->Vector(UB) and Vector(UB)->Cube(L1), to measure bandwidth; no compute
  • Fused(pipelined) matmul-add (C2V) and add-matmul (V2C) as minimum mix kernel example

Each case is reimplemented in 3~4 different API styles, including raw flag, simple push, advanced push, ...

Required dependency to run: Tested on this pto-isa commit 933ad5d8 on 05/12. Should at least be newer than commit aef3a004 on 05/07, after PR 895.

TODO:

  • Test on A5/950. The C-V bandwidth should increase from 1 TB/s to the order of 5-10 TB/s

@learning-chip learning-chip marked this pull request as ready for review May 12, 2026 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant