docs(isa): Sync MkDocs site to PTO-Gym v0.6 micro-instruction SPEC#126
Conversation
There was a problem hiding this comment.
Code Review
This pull request provides a major update to the PTO ISA documentation, introducing detailed specifications for Cube micro-instructions, the FIXPIPE writeback model, and NZ fractal layouts. It also updates the scalar DMA copy documentation to reflect the v0.6 grouped transfer interface and adds references for new conversion and status query operations. The review feedback pointed out a consistency error in the L0B physical layout description and identified several missing type parameters in the MLIR syntax and examples for the pto.mte_gm_ub instruction.
753b91f to
67f5053
Compare
Align the pto-isa MkDocs site (source for https://pto-isa.github.io/) with the authoritative PTO-Gym v0.6 PTO-micro-Instruction-SPEC. This is a documentation-only sync; no runtime, backend, or build-script changes beyond MkDocs nav/page-list wiring for the new and renamed pages. - Alignment-state pages: correct `pto.init_align` to be store-side only (initializes carriers for vstus/vstur/vstar/vstas/pstu; load streams still start from vldas); add 9-cycle latency on vldas/vldus/vstus; add no-post A5 interface notes for vldus (hidden base field, reuse semantics); add `%offset`/`%base` "advances stream, not absolute store" semantics for vstus - Conversion ops: add three missing pages — `pto.vbitcast` (vreg bitwise reinterpretation), `pto.pbitcast` (mask granularity bitcast), `pto.get_vms4_sr` (VMS4_SR status read after `pto.vmrgsort4`); cross-link from conversion-ops and micro-instruction landing pages - DMA Copy section: replace the pre-v0.6 surface (`pto.copy_gm_to_ubuf` plus standalone `set_loop_size_*` / `set_loop_stride_*` configuration ops) with the v0.6 grouped form `pto.mte_gm_ub` / `pto.mte_ub_gm` / `pto.mte_ub_ub` using inline `nburst(...)` / `loop(...)` / `pad(...)` clauses; add new `pto.mte_ub_l1` page; mark the six standalone `set_loop_*` pages deprecated with migration banners; URL slugs preserved to avoid breaking external links - New `/isa/cube/` section (previously absent): README + three architectural pages (NZ Fractal Layout, Buffer Hierarchy, FIXPIPE Model); 6 MAD ops (`mad`, `mad_acc`, `mad_bias`, `mad_mx`, `mad_mx_acc`, `mad_mx_bias`) with shared MAD Common Clauses table and MX Matmul Model; 13 data-movement ops (`mte_gm_l1`, `mte_gm_l1_frac`, `mte_l1_ub`, `mte_ub_l1`, `mte_l1_l0a`/`l0b`, `mte_l1_l0a_mx`/`l0b_mx`, `mte_l1_bt`, `mte_l1_fb`, `mte_l0c_l1`/`gm`/`ub`); each page follows the existing pto-isa per-op template - Predicate interleave/deinterleave variants: add the four missing granularity variants — `pto.pintlv_b8`, `pto.pintlv_b32`, `pto.pdintlv_b16`, `pto.pdintlv_b32` — so each family covers b8/b16/b32 per the SPEC; also retype the existing `pintlv_b16` / `pdintlv_b8` syntax to explicit `!pto.mask<bN>` and refresh the predicate-generation-and-algebra landing pages to list all six - Dual load/store naming: rename `pto.vldx2` → `pto.vldsx2` and `pto.vstx2` → `pto.vstsx2` to match SPEC v0.6 across page titles, syntax / assembly forms, and landing-page entries; rename the per-op files and update all 21 cross-references including the `mkdocs.yml` nav and `gen_pages.py` path list; add 9-cycle (`vldsx2`) and 12-cycle (`vstsx2`) A5 latency entries sourced from SPEC §III - SFU/DSA naming: rename `pto.vexpdiff` → `pto.vexpdif` (single 'f', SPEC v0.6); page renamed + 22 cross-references updated across docs - Predicate granularity (`pset` / `pge` / `plt` `_bN`): correct syntax to typed `!pto.mask<bN>` instead of bare `!pto.mask` for the 9 instructions (b8/b16/b32 each); English + `_zh` updated - `!pto.mask` granularity sweep (~800 docs): replace every remaining bare `!pto.mask` reference with the SPEC-canonical parametric form `!pto.mask<G>`, then upgrade `<G>` to concrete `<b32>`/`<b16>`/`<b8>` in worked-example lines whose surrounding `!pto.vreg<NxT>` carries a concrete element family. Template lines and mixed-family lines keep `<G>` (parametric). 0 bare-mask occurrences remain after the sweep (782 → 0) - New `/isa/tile/view-and-tile-buf` section (previously absent): SPEC Tile §2.4-2.5 + §3 define foundational tensor-view and tile-buffer descriptor/handle operations that the tile compute surface depends on. Add 9 op pages — `pto.make_tensor_view`, `pto.get_tensor_view_dim`, `pto.get_tensor_view_stride`, `pto.tensor_view_addr`, `pto.partition_view`, `pto.alloc_tile`, `pto.subset`, `pto.set_validshape`, `pto.tile_buf_addr` (the tile↔vector bridge inside `pto.vecscope`) — plus a landing page and tile README / current-isa-scope / mkdocs nav wiring - Internal H1 normalization: change `docs/isa/system/ops/TFREE.md` H1 from `# TFREE` to `# pto.tfree` for consistency with TPOP/TPUSH; normalize 11 `_zh` comm-section H1s (`TBROADCAST_zh`, `TGATHER_zh`, `TGET_zh`, `TGET_ASYNC_zh`, `TNOTIFY_zh`, `TPUT_zh`, `TPUT_ASYNC_zh`, `TREDUCE_zh`, `TSCATTER_zh`, `TTEST_zh`, `TWAIT_zh`) from uppercase `TUPPERCASE` form to `# pto.t*` form - Strip UTF-8 BOM (U+FEFF) from 28 affected `docs/isa/**/*.md` files for consistent file encoding - Chinese (`_zh`) translations for every new and updated page, kept structurally identical to the English counterparts - MkDocs wiring: register the renamed `vldsx2` / `vstsx2` / `vexpdif` files and the new cube / view-and-tile-buf nav subtrees in `docs/mkdocs/mkdocs.yml` and `docs/mkdocs/gen_pages.py` - [x] `cmake --build build/docs --target pto_docs` builds cleanly, no WARNING from MkDocs nav resolution - [x] All updated and new pages render under `build/docs/site/` - [x] Spot-checked rendered HTML for v0.6 syntax, 9-cycle latency notes, deprecation banners, cube section navigation, the renamed `pto.vldsx2` / `pto.vstsx2` titles, the new view-and-tile-buf section, and the normalized comm/system H1s - [x] `_zh` files mirror English counterparts structurally
zhoubot
left a comment
There was a problem hiding this comment.
Reviewed the updated PR against current origin/main after PR 131.
No blocking findings from this pass. Verification:
- Merged
origin/pr/126into a detached current-main worktree; merge was clean. git diff --check origin/main...origin/pr/126passed.mkdocs build -f docs/mkdocs/mkdocs.yml --strictpassed locally.- Checked the previously reported CI failure: the failing GitHub full-ST job was the stale
cpu_stub.hpp/SyncCoreTypecompile error that PR 131 fixed onmain. In the merged local worktree, the CPU ST and CPU comm ST builds progressed past that failure. The local full runner then stopped during data generation because this machine lacks Python packageml_dtypesfortmatmul_mx, so I could not complete the full runtime suite locally.
Align the pto-isa MkDocs site (source for https://pto-isa.github.io/)
with the authoritative PTO-Gym v0.6 PTO-micro-Instruction-SPEC. This is
a documentation-only sync; no runtime, backend, or build-script changes
beyond MkDocs nav/page-list wiring for the new and renamed pages.
Summary
pto.init_alignto be store-side only(initializes carriers for vstus/vstur/vstar/vstas/pstu; load streams
still start from vldas); add 9-cycle latency on vldas/vldus/vstus;
add no-post A5 interface notes for vldus (hidden base field, reuse
semantics); add
%offset/%base"advances stream, not absolutestore" semantics for vstus
pto.vbitcast(vregbitwise reinterpretation),
pto.pbitcast(mask granularity bitcast),pto.get_vms4_sr(VMS4_SR status read afterpto.vmrgsort4);cross-link from conversion-ops and micro-instruction landing pages
pto.copy_gm_to_ubufplus standalone
set_loop_size_*/set_loop_stride_*configurationops) with the v0.6 grouped form
pto.mte_gm_ub/pto.mte_ub_gm/pto.mte_ub_ubusing inlinenburst(...)/loop(...)/pad(...)clauses; add new
pto.mte_ub_l1page; mark the six standaloneset_loop_*pages deprecated with migration banners; URL slugspreserved to avoid breaking external links
/isa/cube/section (previously absent): README + threearchitectural pages (NZ Fractal Layout, Buffer Hierarchy, FIXPIPE
Model); 6 MAD ops (
mad,mad_acc,mad_bias,mad_mx,mad_mx_acc,mad_mx_bias) with shared MAD Common Clauses tableand MX Matmul Model; 13 data-movement ops (
mte_gm_l1,mte_gm_l1_frac,mte_l1_ub,mte_ub_l1,mte_l1_l0a/l0b,mte_l1_l0a_mx/l0b_mx,mte_l1_bt,mte_l1_fb,mte_l0c_l1/gm/ub); each page follows the existing pto-isaper-op template
granularity variants —
pto.pintlv_b8,pto.pintlv_b32,pto.pdintlv_b16,pto.pdintlv_b32— so each family coversb8/b16/b32 per the SPEC; also retype the existing
pintlv_b16/pdintlv_b8syntax to explicit!pto.mask<bN>and refresh thepredicate-generation-and-algebra landing pages to list all six
pto.vldx2→pto.vldsx2andpto.vstx2→pto.vstsx2to match SPEC v0.6 across page titles,syntax / assembly forms, and landing-page entries; rename the per-op
files and update all 21 cross-references including the
mkdocs.ymlnav and
gen_pages.pypath list; add 9-cycle (vldsx2) and12-cycle (
vstsx2) A5 latency entries sourced from SPEC §IIIpto.vexpdiff→pto.vexpdif(single 'f',SPEC v0.6); page renamed + 22 cross-references updated across docs
pset/pge/plt_bN): correct syntaxto typed
!pto.mask<bN>instead of bare!pto.maskfor the 9instructions (b8/b16/b32 each); English +
_zhupdated!pto.maskgranularity sweep (~800 docs): replace every remainingbare
!pto.maskreference with the SPEC-canonical parametric form!pto.mask<G>, then upgrade<G>to concrete<b32>/<b16>/<b8>in worked-example lines whose surrounding
!pto.vreg<NxT>carries aconcrete element family. Template lines and mixed-family lines keep
<G>(parametric). 0 bare-mask occurrences remain after the sweep(782 → 0)
/isa/tile/view-and-tile-bufsection (previously absent): SPECTile §2.4-2.5 + §3 define foundational tensor-view and tile-buffer
descriptor/handle operations that the tile compute surface depends
on. Add 9 op pages —
pto.make_tensor_view,pto.get_tensor_view_dim,pto.get_tensor_view_stride,pto.tensor_view_addr,pto.partition_view,pto.alloc_tile,pto.subset,pto.set_validshape,pto.tile_buf_addr(thetile↔vector bridge inside
pto.vecscope) — plus a landing page andtile README / current-isa-scope / mkdocs nav wiring
docs/isa/system/ops/TFREE.mdH1from
# TFREEto# pto.tfreefor consistency with TPOP/TPUSH;normalize 11
_zhcomm-section H1s (TBROADCAST_zh,TGATHER_zh,TGET_zh,TGET_ASYNC_zh,TNOTIFY_zh,TPUT_zh,TPUT_ASYNC_zh,TREDUCE_zh,TSCATTER_zh,TTEST_zh,TWAIT_zh) from uppercaseTUPPERCASEform to# pto.t*formdocs/isa/**/*.mdfilesfor consistent file encoding
_zh) translations for every new and updated page, keptstructurally identical to the English counterparts
vldsx2/vstsx2/vexpdiffiles and the new cube / view-and-tile-buf nav subtrees in
docs/mkdocs/mkdocs.ymlanddocs/mkdocs/gen_pages.pyTesting
cmake --build build/docs --target pto_docsbuilds cleanly, noWARNING from MkDocs nav resolution
build/docs/site/notes, deprecation banners, cube section navigation, the renamed
pto.vldsx2/pto.vstsx2titles, the new view-and-tile-bufsection, and the normalized comm/system H1s
_zhfiles mirror English counterparts structurally