Skip to content

[Tile] Use unpacked vector field for Tile16x16/Tile32x32 register storage#722

Open
hughperkins wants to merge 3 commits into
mainfrom
hp/tiles-use-unpacked-vector
Open

[Tile] Use unpacked vector field for Tile16x16/Tile32x32 register storage#722
hughperkins wants to merge 3 commits into
mainfrom
hp/tiles-use-unpacked-vector

Conversation

@hughperkins

Copy link
Copy Markdown
Collaborator

Summary

Replace the hand-rolled r0..rN-1: dtype field declarations and their matching cascades in Tile16x16 / Tile32x32 with a single

r: qd.types.vector(_TILE, dtype, unpacked=True)

field, accessed as self.r[k]. With python-int / qd.static-resolved indices the unpacked vector still maps to one independent register slot per use, so the generated PTX/LLVM IR is unchanged — but the source shrinks dramatically (net -870 lines).

Also drops the now-redundant private helpers (_get_col, _set_col, _r) and the _REGS field-name table. These were all _-prefixed and only used internally to the two tile modules.

Test plan

  • pre-commit run -a (black, ruff, pylint): clean
  • pyright python/quadrants/lang/simt/_tile16.py python/quadrants/lang/simt/_tile32.py: 0 errors
  • python tests/run_tests.py -v -t1 test_tile on an RTX PRO 6000 cluster node: 732 passed, 182 skipped, 0 failed (~10 min); covers cuda+vulkan, f32+f64, ndarray+field for both tile sizes, including cholesky_, solve_triangular_, qd.outer(...) rank-1 updates, slice load/store, and the blocked-Cholesky demo
  • Existing 68 PURE.VIOLATION warnings on TILE / SIZE test globals are pre-existing and unrelated

Made with Cursor

…rage

Replace hand-rolled ``r0..rN-1: dtype`` field declarations and their
matching ``if k == 0: self.r0 = val; ...`` cascades with a single
``r: qd.types.vector(_TILE, dtype, unpacked=True)`` field accessed via
``self.r[k]``.  This shrinks the surface area significantly (net -870
lines) without changing the generated PTX/LLVM IR: with python-int /
qd.static-resolved indices the unpacked field still maps to one
register slot per use, matching what the explicit cascade produced.

Also removes the now-redundant private helpers ``_get_col``,
``_set_col``, ``_r`` and the ``_REGS`` field-name table.
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

…ed on N

The two factory bodies were structurally identical except for ``_TILE = 16``
vs ``_TILE = 32``.  Replace them with a single ``_make_tile_class(N, dtype)``
factory and a single ``_TileProxy(N)`` proxy class, then instantiate
``Tile16x16Proxy = _TileProxy(16)`` and ``Tile32x32Proxy = _TileProxy(32)``.

Net diff for this commit: -343 lines.  Same generated IR.

Updates the few internal consumers (``simt/__init__.py``, ``tile_slicing.py``,
``quadrants/__init__.py``, ``tests/python/test_tile.py``) and a couple of stale
``test_tile16`` references in the docs.
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

@hughperkins

Copy link
Copy Markdown
Collaborator Author

Benchmarks on genesis:

20260609_tiles_unpacked

dex_hand regression seems concerning 🤔

@hughperkins

Copy link
Copy Markdown
Collaborator Author

tests pass at least:

Screenshot 2026-06-08 at 12 45 19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant