Skip to content

docs(isa): Sync MkDocs site to PTO-Gym v0.6 micro-instruction SPEC#126

Merged
zhoubot merged 2 commits into
hw-native-sys:mainfrom
Crystal-wzy:feature
May 19, 2026
Merged

docs(isa): Sync MkDocs site to PTO-Gym v0.6 micro-instruction SPEC#126
zhoubot merged 2 commits into
hw-native-sys:mainfrom
Crystal-wzy:feature

Conversation

@Crystal-wzy
Copy link
Copy Markdown
Collaborator

@Crystal-wzy Crystal-wzy commented May 14, 2026

Align the pto-isa MkDocs site (source for https://pto-isa.github.io/)
with the authoritative PTO-Gym v0.6 PTO-micro-Instruction-SPEC. This is
a documentation-only sync; no runtime, backend, or build-script changes
beyond MkDocs nav/page-list wiring for the new and renamed pages.

Summary

  • Alignment-state pages: correct pto.init_align to be store-side only
    (initializes carriers for vstus/vstur/vstar/vstas/pstu; load streams
    still start from vldas); add 9-cycle latency on vldas/vldus/vstus;
    add no-post A5 interface notes for vldus (hidden base field, reuse
    semantics); add %offset/%base "advances stream, not absolute
    store" semantics for vstus
  • Conversion ops: add three missing pages — pto.vbitcast (vreg
    bitwise reinterpretation), pto.pbitcast (mask granularity bitcast),
    pto.get_vms4_sr (VMS4_SR status read after pto.vmrgsort4);
    cross-link from conversion-ops and micro-instruction landing pages
  • DMA Copy section: replace the pre-v0.6 surface (pto.copy_gm_to_ubuf
    plus standalone set_loop_size_* / set_loop_stride_* configuration
    ops) with the v0.6 grouped form pto.mte_gm_ub / pto.mte_ub_gm /
    pto.mte_ub_ub using inline nburst(...) / loop(...) / pad(...)
    clauses; add new pto.mte_ub_l1 page; mark the six standalone
    set_loop_* pages deprecated with migration banners; URL slugs
    preserved to avoid breaking external links
  • New /isa/cube/ section (previously absent): README + three
    architectural pages (NZ Fractal Layout, Buffer Hierarchy, FIXPIPE
    Model); 6 MAD ops (mad, mad_acc, mad_bias, mad_mx,
    mad_mx_acc, mad_mx_bias) with shared MAD Common Clauses table
    and MX Matmul Model; 13 data-movement ops (mte_gm_l1,
    mte_gm_l1_frac, mte_l1_ub, mte_ub_l1, mte_l1_l0a/l0b,
    mte_l1_l0a_mx/l0b_mx, mte_l1_bt, mte_l1_fb,
    mte_l0c_l1/gm/ub); each page follows the existing pto-isa
    per-op template
  • Predicate interleave/deinterleave variants: add the four missing
    granularity variants — pto.pintlv_b8, pto.pintlv_b32,
    pto.pdintlv_b16, pto.pdintlv_b32 — so each family covers
    b8/b16/b32 per the SPEC; also retype the existing pintlv_b16 /
    pdintlv_b8 syntax to explicit !pto.mask<bN> and refresh the
    predicate-generation-and-algebra landing pages to list all six
  • Dual load/store naming: rename pto.vldx2pto.vldsx2 and
    pto.vstx2pto.vstsx2 to match SPEC v0.6 across page titles,
    syntax / assembly forms, and landing-page entries; rename the per-op
    files and update all 21 cross-references including the mkdocs.yml
    nav and gen_pages.py path list; add 9-cycle (vldsx2) and
    12-cycle (vstsx2) A5 latency entries sourced from SPEC §III
  • SFU/DSA naming: rename pto.vexpdiffpto.vexpdif (single 'f',
    SPEC v0.6); page renamed + 22 cross-references updated across docs
  • Predicate granularity (pset / pge / plt _bN): correct syntax
    to typed !pto.mask<bN> instead of bare !pto.mask for the 9
    instructions (b8/b16/b32 each); English + _zh updated
  • !pto.mask granularity sweep (~800 docs): replace every remaining
    bare !pto.mask reference with the SPEC-canonical parametric form
    !pto.mask<G>, then upgrade <G> to concrete <b32>/<b16>/<b8>
    in worked-example lines whose surrounding !pto.vreg<NxT> carries a
    concrete element family. Template lines and mixed-family lines keep
    <G> (parametric). 0 bare-mask occurrences remain after the sweep
    (782 → 0)
  • New /isa/tile/view-and-tile-buf section (previously absent): SPEC
    Tile §2.4-2.5 + §3 define foundational tensor-view and tile-buffer
    descriptor/handle operations that the tile compute surface depends
    on. Add 9 op pages — pto.make_tensor_view,
    pto.get_tensor_view_dim, pto.get_tensor_view_stride,
    pto.tensor_view_addr, pto.partition_view, pto.alloc_tile,
    pto.subset, pto.set_validshape, pto.tile_buf_addr (the
    tile↔vector bridge inside pto.vecscope) — plus a landing page and
    tile README / current-isa-scope / mkdocs nav wiring
  • Internal H1 normalization: change docs/isa/system/ops/TFREE.md H1
    from # TFREE to # pto.tfree for consistency with TPOP/TPUSH;
    normalize 11 _zh comm-section H1s (TBROADCAST_zh, TGATHER_zh,
    TGET_zh, TGET_ASYNC_zh, TNOTIFY_zh, TPUT_zh, TPUT_ASYNC_zh,
    TREDUCE_zh, TSCATTER_zh, TTEST_zh, TWAIT_zh) from uppercase
    TUPPERCASE form to # pto.t* form
  • Strip UTF-8 BOM (U+FEFF) from 28 affected docs/isa/**/*.md files
    for consistent file encoding
  • Chinese (_zh) translations for every new and updated page, kept
    structurally identical to the English counterparts
  • MkDocs wiring: register the renamed vldsx2 / vstsx2 / vexpdif
    files and the new cube / view-and-tile-buf nav subtrees in
    docs/mkdocs/mkdocs.yml and docs/mkdocs/gen_pages.py

Testing

  • cmake --build build/docs --target pto_docs builds cleanly, no
    WARNING from MkDocs nav resolution
  • All updated and new pages render under build/docs/site/
  • Spot-checked rendered HTML for v0.6 syntax, 9-cycle latency
    notes, deprecation banners, cube section navigation, the renamed
    pto.vldsx2 / pto.vstsx2 titles, the new view-and-tile-buf
    section, and the normalized comm/system H1s
  • _zh files mirror English counterparts structurally

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides a major update to the PTO ISA documentation, introducing detailed specifications for Cube micro-instructions, the FIXPIPE writeback model, and NZ fractal layouts. It also updates the scalar DMA copy documentation to reflect the v0.6 grouped transfer interface and adds references for new conversion and status query operations. The review feedback pointed out a consistency error in the L0B physical layout description and identified several missing type parameters in the MLIR syntax and examples for the pto.mte_gm_ub instruction.

Comment thread docs/isa/cube/nz-fractal-layout.md Outdated
Comment thread docs/isa/cube/nz-fractal-layout_zh.md Outdated
Comment thread docs/isa/scalar/ops/dma-copy/copy-gm-to-ubuf.md Outdated
Comment thread docs/isa/scalar/ops/dma-copy/copy-gm-to-ubuf.md Outdated
Comment thread docs/isa/scalar/ops/dma-copy/copy-gm-to-ubuf.md Outdated
Comment thread docs/isa/scalar/ops/dma-copy/copy-gm-to-ubuf_zh.md Outdated
Comment thread docs/isa/scalar/ops/dma-copy/copy-gm-to-ubuf_zh.md Outdated
Comment thread docs/isa/scalar/ops/dma-copy/copy-gm-to-ubuf_zh.md Outdated
@Crystal-wzy Crystal-wzy force-pushed the feature branch 3 times, most recently from 753b91f to 67f5053 Compare May 15, 2026 03:50
@Crystal-wzy Crystal-wzy changed the title docs: Sync micro-instruction reference to PTO-Gym v0.6 SPEC docs(isa): Sync MkDocs site to PTO-Gym v0.6 micro-instruction SPEC May 15, 2026
Align the pto-isa MkDocs site (source for https://pto-isa.github.io/)
with the authoritative PTO-Gym v0.6 PTO-micro-Instruction-SPEC. This is
a documentation-only sync; no runtime, backend, or build-script changes
beyond MkDocs nav/page-list wiring for the new and renamed pages.

- Alignment-state pages: correct `pto.init_align` to be store-side only
  (initializes carriers for vstus/vstur/vstar/vstas/pstu; load streams
  still start from vldas); add 9-cycle latency on vldas/vldus/vstus;
  add no-post A5 interface notes for vldus (hidden base field, reuse
  semantics); add `%offset`/`%base` "advances stream, not absolute
  store" semantics for vstus
- Conversion ops: add three missing pages — `pto.vbitcast` (vreg
  bitwise reinterpretation), `pto.pbitcast` (mask granularity bitcast),
  `pto.get_vms4_sr` (VMS4_SR status read after `pto.vmrgsort4`);
  cross-link from conversion-ops and micro-instruction landing pages
- DMA Copy section: replace the pre-v0.6 surface (`pto.copy_gm_to_ubuf`
  plus standalone `set_loop_size_*` / `set_loop_stride_*` configuration
  ops) with the v0.6 grouped form `pto.mte_gm_ub` / `pto.mte_ub_gm` /
  `pto.mte_ub_ub` using inline `nburst(...)` / `loop(...)` / `pad(...)`
  clauses; add new `pto.mte_ub_l1` page; mark the six standalone
  `set_loop_*` pages deprecated with migration banners; URL slugs
  preserved to avoid breaking external links
- New `/isa/cube/` section (previously absent): README + three
  architectural pages (NZ Fractal Layout, Buffer Hierarchy, FIXPIPE
  Model); 6 MAD ops (`mad`, `mad_acc`, `mad_bias`, `mad_mx`,
  `mad_mx_acc`, `mad_mx_bias`) with shared MAD Common Clauses table
  and MX Matmul Model; 13 data-movement ops (`mte_gm_l1`,
  `mte_gm_l1_frac`, `mte_l1_ub`, `mte_ub_l1`, `mte_l1_l0a`/`l0b`,
  `mte_l1_l0a_mx`/`l0b_mx`, `mte_l1_bt`, `mte_l1_fb`,
  `mte_l0c_l1`/`gm`/`ub`); each page follows the existing pto-isa
  per-op template
- Predicate interleave/deinterleave variants: add the four missing
  granularity variants — `pto.pintlv_b8`, `pto.pintlv_b32`,
  `pto.pdintlv_b16`, `pto.pdintlv_b32` — so each family covers
  b8/b16/b32 per the SPEC; also retype the existing `pintlv_b16` /
  `pdintlv_b8` syntax to explicit `!pto.mask<bN>` and refresh the
  predicate-generation-and-algebra landing pages to list all six
- Dual load/store naming: rename `pto.vldx2` → `pto.vldsx2` and
  `pto.vstx2` → `pto.vstsx2` to match SPEC v0.6 across page titles,
  syntax / assembly forms, and landing-page entries; rename the per-op
  files and update all 21 cross-references including the `mkdocs.yml`
  nav and `gen_pages.py` path list; add 9-cycle (`vldsx2`) and
  12-cycle (`vstsx2`) A5 latency entries sourced from SPEC §III
- SFU/DSA naming: rename `pto.vexpdiff` → `pto.vexpdif` (single 'f',
  SPEC v0.6); page renamed + 22 cross-references updated across docs
- Predicate granularity (`pset` / `pge` / `plt` `_bN`): correct syntax
  to typed `!pto.mask<bN>` instead of bare `!pto.mask` for the 9
  instructions (b8/b16/b32 each); English + `_zh` updated
- `!pto.mask` granularity sweep (~800 docs): replace every remaining
  bare `!pto.mask` reference with the SPEC-canonical parametric form
  `!pto.mask<G>`, then upgrade `<G>` to concrete `<b32>`/`<b16>`/`<b8>`
  in worked-example lines whose surrounding `!pto.vreg<NxT>` carries a
  concrete element family. Template lines and mixed-family lines keep
  `<G>` (parametric). 0 bare-mask occurrences remain after the sweep
  (782 → 0)
- New `/isa/tile/view-and-tile-buf` section (previously absent): SPEC
  Tile §2.4-2.5 + §3 define foundational tensor-view and tile-buffer
  descriptor/handle operations that the tile compute surface depends
  on. Add 9 op pages — `pto.make_tensor_view`,
  `pto.get_tensor_view_dim`, `pto.get_tensor_view_stride`,
  `pto.tensor_view_addr`, `pto.partition_view`, `pto.alloc_tile`,
  `pto.subset`, `pto.set_validshape`, `pto.tile_buf_addr` (the
  tile↔vector bridge inside `pto.vecscope`) — plus a landing page and
  tile README / current-isa-scope / mkdocs nav wiring
- Internal H1 normalization: change `docs/isa/system/ops/TFREE.md` H1
  from `# TFREE` to `# pto.tfree` for consistency with TPOP/TPUSH;
  normalize 11 `_zh` comm-section H1s (`TBROADCAST_zh`, `TGATHER_zh`,
  `TGET_zh`, `TGET_ASYNC_zh`, `TNOTIFY_zh`, `TPUT_zh`, `TPUT_ASYNC_zh`,
  `TREDUCE_zh`, `TSCATTER_zh`, `TTEST_zh`, `TWAIT_zh`) from uppercase
  `TUPPERCASE` form to `# pto.t*` form
- Strip UTF-8 BOM (U+FEFF) from 28 affected `docs/isa/**/*.md` files
  for consistent file encoding
- Chinese (`_zh`) translations for every new and updated page, kept
  structurally identical to the English counterparts
- MkDocs wiring: register the renamed `vldsx2` / `vstsx2` / `vexpdif`
  files and the new cube / view-and-tile-buf nav subtrees in
  `docs/mkdocs/mkdocs.yml` and `docs/mkdocs/gen_pages.py`

- [x] `cmake --build build/docs --target pto_docs` builds cleanly, no
  WARNING from MkDocs nav resolution
- [x] All updated and new pages render under `build/docs/site/`
- [x] Spot-checked rendered HTML for v0.6 syntax, 9-cycle latency
  notes, deprecation banners, cube section navigation, the renamed
  `pto.vldsx2` / `pto.vstsx2` titles, the new view-and-tile-buf
  section, and the normalized comm/system H1s
- [x] `_zh` files mirror English counterparts structurally
Copy link
Copy Markdown
Collaborator

@zhoubot zhoubot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the updated PR against current origin/main after PR 131.

No blocking findings from this pass. Verification:

  • Merged origin/pr/126 into a detached current-main worktree; merge was clean.
  • git diff --check origin/main...origin/pr/126 passed.
  • mkdocs build -f docs/mkdocs/mkdocs.yml --strict passed locally.
  • Checked the previously reported CI failure: the failing GitHub full-ST job was the stale cpu_stub.hpp / SyncCoreType compile error that PR 131 fixed on main. In the merged local worktree, the CPU ST and CPU comm ST builds progressed past that failure. The local full runner then stopped during data generation because this machine lacks Python package ml_dtypes for tmatmul_mx, so I could not complete the full runtime suite locally.

@zhoubot zhoubot merged commit 57a9c34 into hw-native-sys:main May 19, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants