[Tile] Use unpacked vector field for Tile16x16/Tile32x32 register storage#722
Open
hughperkins wants to merge 3 commits into
Open
[Tile] Use unpacked vector field for Tile16x16/Tile32x32 register storage#722hughperkins wants to merge 3 commits into
hughperkins wants to merge 3 commits into
Conversation
…rage Replace hand-rolled ``r0..rN-1: dtype`` field declarations and their matching ``if k == 0: self.r0 = val; ...`` cascades with a single ``r: qd.types.vector(_TILE, dtype, unpacked=True)`` field accessed via ``self.r[k]``. This shrinks the surface area significantly (net -870 lines) without changing the generated PTX/LLVM IR: with python-int / qd.static-resolved indices the unpacked field still maps to one register slot per use, matching what the explicit cascade produced. Also removes the now-redundant private helpers ``_get_col``, ``_set_col``, ``_r`` and the ``_REGS`` field-name table.
…ed on N The two factory bodies were structurally identical except for ``_TILE = 16`` vs ``_TILE = 32``. Replace them with a single ``_make_tile_class(N, dtype)`` factory and a single ``_TileProxy(N)`` proxy class, then instantiate ``Tile16x16Proxy = _TileProxy(16)`` and ``Tile32x32Proxy = _TileProxy(32)``. Net diff for this commit: -343 lines. Same generated IR. Updates the few internal consumers (``simt/__init__.py``, ``tile_slicing.py``, ``quadrants/__init__.py``, ``tests/python/test_tile.py``) and a couple of stale ``test_tile16`` references in the docs.
Collaborator
Author
Collaborator
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Summary
Replace the hand-rolled
r0..rN-1: dtypefield declarations and their matching cascades inTile16x16/Tile32x32with a singlefield, accessed as
self.r[k]. With python-int /qd.static-resolved indices the unpacked vector still maps to one independent register slot per use, so the generated PTX/LLVM IR is unchanged — but the source shrinks dramatically (net -870 lines).Also drops the now-redundant private helpers (
_get_col,_set_col,_r) and the_REGSfield-name table. These were all_-prefixed and only used internally to the two tile modules.Test plan
pre-commit run -a(black, ruff, pylint): cleanpyright python/quadrants/lang/simt/_tile16.py python/quadrants/lang/simt/_tile32.py: 0 errorspython tests/run_tests.py -v -t1 test_tileon an RTX PRO 6000 cluster node: 732 passed, 182 skipped, 0 failed (~10 min); covers cuda+vulkan, f32+f64, ndarray+field for both tile sizes, includingcholesky_,solve_triangular_,qd.outer(...)rank-1 updates, slice load/store, and the blocked-Cholesky demoPURE.VIOLATIONwarnings onTILE/SIZEtest globals are pre-existing and unrelatedMade with Cursor