Add tile transfer annotation #127

lukamac · 2025-10-22T17:02:29Z

This PR adds another tiling-related annotation to ExtensionBlocks, the tile transfer information, i.e., the size of tiles to transfer in each step of the tiling loop. This was previously (re)generated in the wrapTilingSolution class function of TileConstraints and it was kinda confusing because it had to generate the whole memory path until the current target each time.
Instead, I've calculate it once in the TilerExtension and annotate the executionBlock the same way as with the PatternMemoryConstraint.
_Note: I'm not in love with this executionBlock annotation approach, but want to keep this PR short, so I'll leave that for later.

Another tiling change was to computeTilingHyperRectangles which is a helper function to calculates all the tiles that need to be executed on if we had a tensor in the external memory (think bigger memory) that needs to be processed in the local memory (think smaller memory) in a certain tensor shape.
Previously this function would accept the MemoryTransfer which is just a dataclass that contains a source (big) and destination (small) memory constraints. The constraints were basically just used for their shapes and actually the source constraint's shape had to be adjusted each time before calling the function which confused me even more.
This way it's much clearer what is needed and that we don't need the actual memory constraints but just shapes.
One thing that I'm a little bit unsure is that previously, if any of the dimensions of the external memory tensors shape was smaller then the local ones, we would replace the local shape with the external one. I haven't copied this behavior fully, but instead always chose the minimum value between the local and external shape's. In the distant future where we have some more advanced tiling with different memory levels having different tile sizes, this should probably be revisited, but for now I think it's good enough.

I also added casting of the external pointer to uint32_t in the L3Dma to fix compilation warnings.

Added

Added transfer annotation of tiled execution blocks

Changed

Refactored computeTilingRectangles
wrapTilingSolution now uses the transfer annotation

Fixed

Fixed compiler warning by casting the external pointer in L3Dma to uint32_t

PR Merge Checklist

The PR is rebased on the latest devel commit and pointing to devel.
Your PR reviewed and approved.
All checks are passing.
The CHANGELOG.md file has been updated.
If the docker was modified, change back its link after review.

Instead of calculating the transfer information in the wrapTilingSolution function every time for each memory level, do it once in the TilerExtension and annotate the execution block with it like with the pattern. I'm not fully satisfied with the approach, but it's a step in the right direction.

coderabbitai · 2025-10-22T17:22:54Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added tile transfer annotation support for tiled execution blocks, enabling better tracking of data transfers across memory levels.
Bug Fixes
- Fixed compiler warning by correcting external pointer type casting.
Refactor
- Simplified tiling solution integration logic for improved maintainability and efficiency.

Walkthrough

Adds transfer annotations for tiled execution blocks, moves transfer computation into the Tiler, updates tiling APIs to accept precomputed transfers, removes the MemoryTransfer abstraction, and casts an external DMA pointer to uint32_t in a hardware template.

Changes

Cohort / File(s)	Summary
Type System Extensions `Deeploy/DeeployTypes.py`	Added `transfers` attribute to `ExecutionBlock` as `Optional[Dict[str, Dict[str, List[List[AbsoluteHyperRectangle]]]]]` to store per-tensor transfer maps.
Tiler — transfer computation & wiring `Deeploy/TilingExtension/TilerExtension.py`	Added `getTransfers` and `getIoTransfers` methods; Tiler now computes per-tensor/per-target transfers and `TilerDeployerWrapper.tile` assigns the result to `executionBlock.transfers`.
Tiling codegen API & internals `Deeploy/TilingExtension/TileConstraint.py`, `Deeploy/TilingExtension/TilingCodegen.py`	`wrapTilingSolution` signature updated to accept `transfers`; removed internal MemoryTransfer computation and helper functions. `computeTileHyperRectangles` now accepts `(externalShape, localShape)` and `MemoryTransfer` dataclass removed.
Transformation passes `Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py`, `Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py`	Import/use `TileConstraint`; extract `targetMemoryTransfers` from `baseExecutionBlock.transfers`; guard early-return when transfers missing; pass transfers into `wrapTilingSolution` calls.
Hardware DMA Template `Deeploy/Targets/PULPOpen/DMA/L3Dma.py`	Cast external pointer argument to `uint32_t` in `_transferTemplates[2]` `pi_cl_ram_copy_2d` call: `(uint32_t)${ext}`.
Changelog `CHANGELOG.md`	Added Unreleased entries for transfer annotation feature, refactors to tiling and computeTilingRectangles, and the DMA pointer cast fix.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Tiler
    participant Deployer as TilerDeployerWrapper
    participant ExecBlock as ExecutionBlock
    participant TileConstraint
    participant CodeGen

    Tiler->>Tiler: computeTileHyperRectangles(externalShape, localShape)
    Tiler->>Tiler: getTransfers(tensorMc) → transfers per-tensor
    Tiler->>Tiler: getIoTransfers(patternMc) → io_transfers (per-output → per-target → list-of-rects)

    Deployer->>Tiler: getIoTransfers(pattern)
    Tiler-->>Deployer: io_transfers
    Deployer->>ExecBlock: executionBlock.transfers = io_transfers

    ExecBlock->>TileConstraint: wrapTilingSolution(..., transfers)
    TileConstraint-->>CodeGen: VariableReplacementScheme + TilingSchedule[]
    Note right of TileConstraint: now consumes precomputed transfers\ninstead of computing memory-path transfers internally

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Refactor Logging for Improved Debugging #115 — Modifies ExecutionBlock in Deeploy/DeeployTypes.py; likely related to changes around execution block fields and debugging/memory metadata.

Suggested labels

Feature

Suggested reviewers

Victor-Jung

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The PR title "Add tile transfer annotation" directly corresponds to the primary objective of this changeset, which is to add transfer annotation data to ExecutionBlocks to avoid regenerating transfer information during code generation. While the PR includes secondary changes such as refactoring computeTilingRectangles and fixing an L3Dma compiler warning, the title accurately captures the main feature. The title is concise, clear, and specific enough to convey the primary change to someone scanning the repository history.
Description Check	✅ Passed	The PR description is clearly related to the changeset and provides substantive information about the changes. It explains the motivation for adding tile transfer annotation (eliminating repeated memory path reconstruction), describes the refactoring of computeTilingRectangles with rationale for the simplification, and documents the L3Dma compiler warning fix. The description is well-organized with an Added/Changed/Fixed section and includes relevant implementation details without being overly verbose.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

Deeploy/TilingExtension/TilingCodegen.py (1)

238-276: Prevent div‑by‑zero and make zips explicit; minor robustness.

Add positive‑dimension asserts; make zip(..., strict=True) explicit; cast np.prod to int.

 def computeTileHyperRectangles(externalShape: Tuple[int, ...], localShape: Tuple[int, ...]) -> List[HyperRectangle]:
     assert len(externalShape) == len(localShape), \
     f"External and local memory shapes don't have the same number of dimensions! External {externalShape} vs. Local {localShape}"
 
-    # LMACAN: The local shape dimensions are of the local buffer so if the external tile is smaller, that's fine
-    localShape = tuple(min(ext, loc) for ext, loc in zip(externalShape, localShape))
+    # LMACAN: The local shape dimensions are of the local buffer so if the external tile is smaller, that's fine
+    localShape = tuple(min(ext, loc) for ext, loc in zip(externalShape, localShape, strict=True))
+    # Sanity: all dims must be > 0 to avoid division by zero below.
+    assert all(d > 0 for d in externalShape), f"externalShape must be > 0, got {externalShape}"
+    assert all(d > 0 for d in localShape), f"localShape must be > 0, got {localShape}"
@@
-    def nextTileIndex(tileIndexEnd: List[int]) -> Generator[List[int]]:
-        tileCount = np.prod(tileIndexEnd)
+    def nextTileIndex(tileIndexEnd: List[int]) -> Generator[List[int]]:
+        tileCount = int(np.prod(tileIndexEnd))
         tileIndex = [0] * len(tileIndexEnd)
         for _ in range(tileCount):
             yield tileIndex
-            for dimIdx, (idx, end) in enumerate(zip(tileIndex, tileIndexEnd)):
+            for dimIdx, (idx, end) in enumerate(zip(tileIndex, tileIndexEnd, strict=True)):
                 if idx + 1 < end:
                     tileIndex[dimIdx] = idx + 1
                     break
                 else:
                     tileIndex[dimIdx] = 0
@@
-    tileIndexEnd = [
-        int(np.ceil(dimSizeLarge / dimSizeSmall)) for dimSizeLarge, dimSizeSmall in zip(externalShape, localShape)
-    ]
+    tileIndexEnd = [int(np.ceil(dimSizeLarge / dimSizeSmall))
+                    for dimSizeLarge, dimSizeSmall in zip(externalShape, localShape, strict=True)]
     for tileIndex in nextTileIndex(tileIndexEnd):
-        tileOffset = tuple(dimIdx * dimSizeSmall for dimIdx, dimSizeSmall in zip(tileIndex, localShape))
-        for dimIdx, (dimOffset, dimSizeLarge) in enumerate(zip(tileOffset, externalShape)):
+        tileOffset = tuple(dimIdx * dimSizeSmall for dimIdx, dimSizeSmall in zip(tileIndex, localShape, strict=True))
+        for dimIdx, (dimOffset, dimSizeLarge) in enumerate(zip(tileOffset, externalShape, strict=True)):
             assert dimOffset >= 0, f"tileOffset[{dimIdx}] shoud not be smaller then zero ({dimOffset} < 0)"
             assert dimOffset < dimSizeLarge, f"tileOffset[{dimIdx}] should not be bigger or equal then largeShape[{dimIdx}] ({dimOffset} >= {dimSizeLarge})"
 
         tileSize = tuple(
             min(dimSizeSmall, dimSizeLarge - dimOffset)
-            for dimSizeSmall, dimSizeLarge, dimOffset in zip(localShape, externalShape, tileOffset))
-        for dimIdx, (dimSize, dimSizeSmall) in enumerate(zip(tileSize, localShape)):
+            for dimSizeSmall, dimSizeLarge, dimOffset in zip(localShape, externalShape, tileOffset, strict=True))
+        for dimIdx, (dimSize, dimSizeSmall) in enumerate(zip(tileSize, localShape, strict=True)):
             assert dimSize > 0, f"tileOffset[{dimIdx}] shoud not be smaller or equal then zero ({dimSize} <= 0)"
             assert dimSize <= dimSizeSmall, f"tileSize[{dimIdx}] should not be bigger then smallShape[{dimIdx}] ({dimSize} > {dimSizeSmall})"

Deeploy/TilingExtension/TileConstraint.py (1)

101-108: Empty transfers handling to avoid IndexError.

If transfers[outVar] is empty, varReplacements[0] will throw. Fail fast with a clear message.

-        for _outputCubes in transfers[outVar]:
+        cubes = transfers[outVar]
+        if not cubes:
+            raise ValueError(f"No transfers provided for output '{outVar}'.")
+        for _outputCubes in cubes:
             varReplacement, tilingSchedule = cls.serializeTilingSolution(tilingSolution, _outputCubes, targetMemLevel,
                                                                          ctxt, operatorRepresentation)
             sanitizedTilingSchedule = cls.sanitizeTilingSchedule(tilingSchedule)
 
             varReplacements.append(varReplacement)
             tilingSchedules.append(sanitizedTilingSchedule)

🧹 Nitpick comments (3)

CHANGELOG.md (1)

78-79: Name the refactored helper correctly.

The function is computeTileHyperRectangles in code; changelog says “computeTilingRectangles”. Align wording.
-- Refactored computeTilingRectangles
+- Refactored computeTileHyperRectangles

Deeploy/TilingExtension/TilerExtension.py (1)

944-965: Make transfers construction safer and clearer (pairwise, strict zip, shape asserts).

Use itertools.pairwise for successive memory levels.
Add shape assertions for localMc.shape to avoid None usage.
Use strict=True in zip of offsets to catch rank mismatches (Python 3.10+).

+from itertools import pairwise
@@
-    def getTransfers(self, tensorMc: TensorMemoryConstraint) -> Dict[str, List[List[AbsoluteHyperRectangle]]]:
+    def getTransfers(self, tensorMc: TensorMemoryConstraint) -> Dict[str, List[List[AbsoluteHyperRectangle]]]:
         transfers: Dict[str, List[List[AbsoluteHyperRectangle]]] = {}
-        mcs = list(tensorMc.memoryConstraints.items())
-        for (externalMemory, externalMc), (localMemory, localMc) in zip(mcs[:-1], mcs[1:]):
+        mcs = list(tensorMc.memoryConstraints.items())
+        for (externalMemory, externalMc), (localMemory, localMc) in pairwise(mcs):
             # TODO: Should we also use externalMemory as a key in the transfers?
             if externalMemory not in transfers:
                 assert externalMc.shape is not None
                 shape = externalMc.shape
                 zeroOffset = (0,) * len(shape)
                 externalAbsoluteRectangles = [AbsoluteHyperRectangle(HyperRectangle(zeroOffset, shape), zeroOffset)]
             else:
                 # Flatten
                 externalAbsoluteRectangles = [rect for _list in transfers[externalMemory] for rect in _list]
 
+            assert localMc.shape is not None, f"Missing local shape for {localMemory}"
             transfers[localMemory] = [[
-                AbsoluteHyperRectangle(rect, tuple(a + b
-                                                   for a, b in zip(extAbsRect.absoluteOffset, rect.offset)))
-                for rect in computeTileHyperRectangles(extAbsRect.rectangle.dims, localMc.shape)
+                AbsoluteHyperRectangle(
+                    rect,
+                    tuple(a + b for a, b in zip(extAbsRect.absoluteOffset, rect.offset, strict=True))
+                )
+                for rect in computeTileHyperRectangles(extAbsRect.rectangle.dims, localMc.shape)
             ]
                                       for extAbsRect in externalAbsoluteRectangles]
         return transfers

If Python < 3.10, drop strict=True or guard it. Also consider a short docstring describing the nested return structure. Based on learnings

Deeploy/TilingExtension/TileConstraint.py (1)

90-95: Optional: simplify API and type to match usage.

Since you assert one output and only iterate transfers[outVar], consider passing a plain List[List[AbsoluteHyperRectangle]] (already level-selected) and drop var-keying here. This reduces coupling to external mapping structure.
-        transfers: Dict[str, List[List[AbsoluteHyperRectangle]]]) -> Tuple[VariableReplacementScheme, List[TilingSchedule]]:
+        cubes: List[List[AbsoluteHyperRectangle]]) -> Tuple[VariableReplacementScheme, List[TilingSchedule]]:
@@
-        outVar, _ = next(iter(tilingSolution.outputTensorMemoryConstraints.items()))
-        varReplacements = []
+        outVar, _ = next(iter(tilingSolution.outputTensorMemoryConstraints.items()))
+        varReplacements = []
@@
-        for _outputCubes in transfers[outVar]:
+        for _outputCubes in cubes:
Coordinate with callers accordingly. Based on learnings

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 15c4a23 and a1973ae.

📒 Files selected for processing (8)

CHANGELOG.md (4 hunks)
Deeploy/DeeployTypes.py (1 hunks)
Deeploy/Targets/PULPOpen/DMA/L3Dma.py (1 hunks)
Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (2 hunks)
Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py (3 hunks)
Deeploy/TilingExtension/TileConstraint.py (2 hunks)
Deeploy/TilingExtension/TilerExtension.py (3 hunks)
Deeploy/TilingExtension/TilingCodegen.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (1)

Deeploy/TilingExtension/TileConstraint.py (2)

TileConstraint (16-154)

wrapTilingSolution (90-113)

Deeploy/TilingExtension/TilerExtension.py (3)

Deeploy/TilingExtension/TilingCodegen.py (3)

AbsoluteHyperRectangle (32-42)

HyperRectangle (17-28)

computeTileHyperRectangles (238-277)

Deeploy/TilingExtension/MemoryConstraints.py (3)

TensorMemoryConstraint (47-92)

PatternMemoryConstraints (170-200)

tensorMemoryConstraints (104-109)

Deeploy/DeeployTypes.py (1)

executionBlock (1593-1596)

Deeploy/TilingExtension/TileConstraint.py (3)

Deeploy/TilingExtension/MemoryConstraints.py (1)

NodeMemoryConstraint (95-167)

Deeploy/TilingExtension/TilingCodegen.py (3)

AbsoluteHyperRectangle (32-42)

TilingSchedule (46-115)

VariableReplacementScheme (119-151)

Deeploy/DeeployTypes.py (1)

NetworkContext (564-1076)

Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py (1)

Deeploy/TilingExtension/TileConstraint.py (2)

TileConstraint (16-154)

wrapTilingSolution (90-113)

🪛 Ruff (0.14.1)

Deeploy/TilingExtension/TilingCodegen.py

243-243: zip() without an explicit strict= parameter