Skip to content

Conversation

@lukamac
Copy link
Contributor

@lukamac lukamac commented Oct 22, 2025

This PR adds another tiling-related annotation to ExtensionBlocks, the tile transfer information, i.e., the size of tiles to transfer in each step of the tiling loop. This was previously (re)generated in the wrapTilingSolution class function of TileConstraints and it was kinda confusing because it had to generate the whole memory path until the current target each time.
Instead, I've calculate it once in the TilerExtension and annotate the executionBlock the same way as with the PatternMemoryConstraint.
_Note: I'm not in love with this executionBlock annotation approach, but want to keep this PR short, so I'll leave that for later.

Another tiling change was to computeTilingHyperRectangles which is a helper function to calculates all the tiles that need to be executed on if we had a tensor in the external memory (think bigger memory) that needs to be processed in the local memory (think smaller memory) in a certain tensor shape.
Previously this function would accept the MemoryTransfer which is just a dataclass that contains a source (big) and destination (small) memory constraints. The constraints were basically just used for their shapes and actually the source constraint's shape had to be adjusted each time before calling the function which confused me even more.
This way it's much clearer what is needed and that we don't need the actual memory constraints but just shapes.
One thing that I'm a little bit unsure is that previously, if any of the dimensions of the external memory tensors shape was smaller then the local ones, we would replace the local shape with the external one. I haven't copied this behavior fully, but instead always chose the minimum value between the local and external shape's. In the distant future where we have some more advanced tiling with different memory levels having different tile sizes, this should probably be revisited, but for now I think it's good enough.

I also added casting of the external pointer to uint32_t in the L3Dma to fix compilation warnings.

Added

  • Added transfer annotation of tiled execution blocks

Changed

  • Refactored computeTilingRectangles
  • wrapTilingSolution now uses the transfer annotation

Fixed

  • Fixed compiler warning by casting the external pointer in L3Dma to uint32_t

PR Merge Checklist

  1. The PR is rebased on the latest devel commit and pointing to devel.
  2. Your PR reviewed and approved.
  3. All checks are passing.
  4. The CHANGELOG.md file has been updated.
  5. If the docker was modified, change back its link after review.

Instead of calculating the transfer information in the
wrapTilingSolution function every time for each memory level, do it once
in the TilerExtension and annotate the execution block with it like with
the pattern. I'm not fully satisfied with the approach, but it's a step
in the right direction.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 22, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added tile transfer annotation support for tiled execution blocks, enabling better tracking of data transfers across memory levels.
  • Bug Fixes

    • Fixed compiler warning by correcting external pointer type casting.
  • Refactor

    • Simplified tiling solution integration logic for improved maintainability and efficiency.

Walkthrough

Adds transfer annotations for tiled execution blocks, moves transfer computation into the Tiler, updates tiling APIs to accept precomputed transfers, removes the MemoryTransfer abstraction, and casts an external DMA pointer to uint32_t in a hardware template.

Changes

Cohort / File(s) Summary
Type System Extensions
Deeploy/DeeployTypes.py
Added transfers attribute to ExecutionBlock as Optional[Dict[str, Dict[str, List[List[AbsoluteHyperRectangle]]]]] to store per-tensor transfer maps.
Tiler — transfer computation & wiring
Deeploy/TilingExtension/TilerExtension.py
Added getTransfers and getIoTransfers methods; Tiler now computes per-tensor/per-target transfers and TilerDeployerWrapper.tile assigns the result to executionBlock.transfers.
Tiling codegen API & internals
Deeploy/TilingExtension/TileConstraint.py, Deeploy/TilingExtension/TilingCodegen.py
wrapTilingSolution signature updated to accept transfers; removed internal MemoryTransfer computation and helper functions. computeTileHyperRectangles now accepts (externalShape, localShape) and MemoryTransfer dataclass removed.
Transformation passes
Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py, Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py
Import/use TileConstraint; extract targetMemoryTransfers from baseExecutionBlock.transfers; guard early-return when transfers missing; pass transfers into wrapTilingSolution calls.
Hardware DMA Template
Deeploy/Targets/PULPOpen/DMA/L3Dma.py
Cast external pointer argument to uint32_t in _transferTemplates[2] pi_cl_ram_copy_2d call: (uint32_t)${ext}.
Changelog
CHANGELOG.md
Added Unreleased entries for transfer annotation feature, refactors to tiling and computeTilingRectangles, and the DMA pointer cast fix.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Tiler
    participant Deployer as TilerDeployerWrapper
    participant ExecBlock as ExecutionBlock
    participant TileConstraint
    participant CodeGen

    Tiler->>Tiler: computeTileHyperRectangles(externalShape, localShape)
    Tiler->>Tiler: getTransfers(tensorMc) → transfers per-tensor
    Tiler->>Tiler: getIoTransfers(patternMc) → io_transfers (per-output → per-target → list-of-rects)

    Deployer->>Tiler: getIoTransfers(pattern)
    Tiler-->>Deployer: io_transfers
    Deployer->>ExecBlock: executionBlock.transfers = io_transfers

    ExecBlock->>TileConstraint: wrapTilingSolution(..., transfers)
    TileConstraint-->>CodeGen: VariableReplacementScheme + TilingSchedule[]
    Note right of TileConstraint: now consumes precomputed transfers\ninstead of computing memory-path transfers internally
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

Feature

Suggested reviewers

  • Victor-Jung

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "Add tile transfer annotation" directly corresponds to the primary objective of this changeset, which is to add transfer annotation data to ExecutionBlocks to avoid regenerating transfer information during code generation. While the PR includes secondary changes such as refactoring computeTilingRectangles and fixing an L3Dma compiler warning, the title accurately captures the main feature. The title is concise, clear, and specific enough to convey the primary change to someone scanning the repository history.
Description Check ✅ Passed The PR description is clearly related to the changeset and provides substantive information about the changes. It explains the motivation for adding tile transfer annotation (eliminating repeated memory path reconstruction), describes the refactoring of computeTilingRectangles with rationale for the simplification, and documents the L3Dma compiler warning fix. The description is well-organized with an Added/Changed/Fixed section and includes relevant implementation details without being overly verbose.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
Deeploy/TilingExtension/TilingCodegen.py (1)

238-276: Prevent div‑by‑zero and make zips explicit; minor robustness.

Add positive‑dimension asserts; make zip(..., strict=True) explicit; cast np.prod to int.

 def computeTileHyperRectangles(externalShape: Tuple[int, ...], localShape: Tuple[int, ...]) -> List[HyperRectangle]:
     assert len(externalShape) == len(localShape), \
     f"External and local memory shapes don't have the same number of dimensions! External {externalShape} vs. Local {localShape}"
 
-    # LMACAN: The local shape dimensions are of the local buffer so if the external tile is smaller, that's fine
-    localShape = tuple(min(ext, loc) for ext, loc in zip(externalShape, localShape))
+    # LMACAN: The local shape dimensions are of the local buffer so if the external tile is smaller, that's fine
+    localShape = tuple(min(ext, loc) for ext, loc in zip(externalShape, localShape, strict=True))
+    # Sanity: all dims must be > 0 to avoid division by zero below.
+    assert all(d > 0 for d in externalShape), f"externalShape must be > 0, got {externalShape}"
+    assert all(d > 0 for d in localShape), f"localShape must be > 0, got {localShape}"
@@
-    def nextTileIndex(tileIndexEnd: List[int]) -> Generator[List[int]]:
-        tileCount = np.prod(tileIndexEnd)
+    def nextTileIndex(tileIndexEnd: List[int]) -> Generator[List[int]]:
+        tileCount = int(np.prod(tileIndexEnd))
         tileIndex = [0] * len(tileIndexEnd)
         for _ in range(tileCount):
             yield tileIndex
-            for dimIdx, (idx, end) in enumerate(zip(tileIndex, tileIndexEnd)):
+            for dimIdx, (idx, end) in enumerate(zip(tileIndex, tileIndexEnd, strict=True)):
                 if idx + 1 < end:
                     tileIndex[dimIdx] = idx + 1
                     break
                 else:
                     tileIndex[dimIdx] = 0
@@
-    tileIndexEnd = [
-        int(np.ceil(dimSizeLarge / dimSizeSmall)) for dimSizeLarge, dimSizeSmall in zip(externalShape, localShape)
-    ]
+    tileIndexEnd = [int(np.ceil(dimSizeLarge / dimSizeSmall))
+                    for dimSizeLarge, dimSizeSmall in zip(externalShape, localShape, strict=True)]
     for tileIndex in nextTileIndex(tileIndexEnd):
-        tileOffset = tuple(dimIdx * dimSizeSmall for dimIdx, dimSizeSmall in zip(tileIndex, localShape))
-        for dimIdx, (dimOffset, dimSizeLarge) in enumerate(zip(tileOffset, externalShape)):
+        tileOffset = tuple(dimIdx * dimSizeSmall for dimIdx, dimSizeSmall in zip(tileIndex, localShape, strict=True))
+        for dimIdx, (dimOffset, dimSizeLarge) in enumerate(zip(tileOffset, externalShape, strict=True)):
             assert dimOffset >= 0, f"tileOffset[{dimIdx}] shoud not be smaller then zero ({dimOffset} < 0)"
             assert dimOffset < dimSizeLarge, f"tileOffset[{dimIdx}] should not be bigger or equal then largeShape[{dimIdx}] ({dimOffset} >= {dimSizeLarge})"
 
         tileSize = tuple(
             min(dimSizeSmall, dimSizeLarge - dimOffset)
-            for dimSizeSmall, dimSizeLarge, dimOffset in zip(localShape, externalShape, tileOffset))
-        for dimIdx, (dimSize, dimSizeSmall) in enumerate(zip(tileSize, localShape)):
+            for dimSizeSmall, dimSizeLarge, dimOffset in zip(localShape, externalShape, tileOffset, strict=True))
+        for dimIdx, (dimSize, dimSizeSmall) in enumerate(zip(tileSize, localShape, strict=True)):
             assert dimSize > 0, f"tileOffset[{dimIdx}] shoud not be smaller or equal then zero ({dimSize} <= 0)"
             assert dimSize <= dimSizeSmall, f"tileSize[{dimIdx}] should not be bigger then smallShape[{dimIdx}] ({dimSize} > {dimSizeSmall})"
Deeploy/TilingExtension/TileConstraint.py (1)

101-108: Empty transfers handling to avoid IndexError.

If transfers[outVar] is empty, varReplacements[0] will throw. Fail fast with a clear message.

-        for _outputCubes in transfers[outVar]:
+        cubes = transfers[outVar]
+        if not cubes:
+            raise ValueError(f"No transfers provided for output '{outVar}'.")
+        for _outputCubes in cubes:
             varReplacement, tilingSchedule = cls.serializeTilingSolution(tilingSolution, _outputCubes, targetMemLevel,
                                                                          ctxt, operatorRepresentation)
             sanitizedTilingSchedule = cls.sanitizeTilingSchedule(tilingSchedule)
 
             varReplacements.append(varReplacement)
             tilingSchedules.append(sanitizedTilingSchedule)
🧹 Nitpick comments (3)
CHANGELOG.md (1)

78-79: Name the refactored helper correctly.

The function is computeTileHyperRectangles in code; changelog says “computeTilingRectangles”. Align wording.

-- Refactored computeTilingRectangles
+- Refactored computeTileHyperRectangles
Deeploy/TilingExtension/TilerExtension.py (1)

944-965: Make transfers construction safer and clearer (pairwise, strict zip, shape asserts).

  • Use itertools.pairwise for successive memory levels.
  • Add shape assertions for localMc.shape to avoid None usage.
  • Use strict=True in zip of offsets to catch rank mismatches (Python 3.10+).
+from itertools import pairwise
@@
-    def getTransfers(self, tensorMc: TensorMemoryConstraint) -> Dict[str, List[List[AbsoluteHyperRectangle]]]:
+    def getTransfers(self, tensorMc: TensorMemoryConstraint) -> Dict[str, List[List[AbsoluteHyperRectangle]]]:
         transfers: Dict[str, List[List[AbsoluteHyperRectangle]]] = {}
-        mcs = list(tensorMc.memoryConstraints.items())
-        for (externalMemory, externalMc), (localMemory, localMc) in zip(mcs[:-1], mcs[1:]):
+        mcs = list(tensorMc.memoryConstraints.items())
+        for (externalMemory, externalMc), (localMemory, localMc) in pairwise(mcs):
             # TODO: Should we also use externalMemory as a key in the transfers?
             if externalMemory not in transfers:
                 assert externalMc.shape is not None
                 shape = externalMc.shape
                 zeroOffset = (0,) * len(shape)
                 externalAbsoluteRectangles = [AbsoluteHyperRectangle(HyperRectangle(zeroOffset, shape), zeroOffset)]
             else:
                 # Flatten
                 externalAbsoluteRectangles = [rect for _list in transfers[externalMemory] for rect in _list]
 
+            assert localMc.shape is not None, f"Missing local shape for {localMemory}"
             transfers[localMemory] = [[
-                AbsoluteHyperRectangle(rect, tuple(a + b
-                                                   for a, b in zip(extAbsRect.absoluteOffset, rect.offset)))
-                for rect in computeTileHyperRectangles(extAbsRect.rectangle.dims, localMc.shape)
+                AbsoluteHyperRectangle(
+                    rect,
+                    tuple(a + b for a, b in zip(extAbsRect.absoluteOffset, rect.offset, strict=True))
+                )
+                for rect in computeTileHyperRectangles(extAbsRect.rectangle.dims, localMc.shape)
             ]
                                       for extAbsRect in externalAbsoluteRectangles]
         return transfers

If Python < 3.10, drop strict=True or guard it. Also consider a short docstring describing the nested return structure. Based on learnings

Deeploy/TilingExtension/TileConstraint.py (1)

90-95: Optional: simplify API and type to match usage.

Since you assert one output and only iterate transfers[outVar], consider passing a plain List[List[AbsoluteHyperRectangle]] (already level-selected) and drop var-keying here. This reduces coupling to external mapping structure.

-        transfers: Dict[str, List[List[AbsoluteHyperRectangle]]]) -> Tuple[VariableReplacementScheme, List[TilingSchedule]]:
+        cubes: List[List[AbsoluteHyperRectangle]]) -> Tuple[VariableReplacementScheme, List[TilingSchedule]]:
@@
-        outVar, _ = next(iter(tilingSolution.outputTensorMemoryConstraints.items()))
-        varReplacements = []
+        outVar, _ = next(iter(tilingSolution.outputTensorMemoryConstraints.items()))
+        varReplacements = []
@@
-        for _outputCubes in transfers[outVar]:
+        for _outputCubes in cubes:

Coordinate with callers accordingly. Based on learnings

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 15c4a23 and a1973ae.

📒 Files selected for processing (8)
  • CHANGELOG.md (4 hunks)
  • Deeploy/DeeployTypes.py (1 hunks)
  • Deeploy/Targets/PULPOpen/DMA/L3Dma.py (1 hunks)
  • Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (2 hunks)
  • Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py (3 hunks)
  • Deeploy/TilingExtension/TileConstraint.py (2 hunks)
  • Deeploy/TilingExtension/TilerExtension.py (3 hunks)
  • Deeploy/TilingExtension/TilingCodegen.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (1)
Deeploy/TilingExtension/TileConstraint.py (2)
  • TileConstraint (16-154)
  • wrapTilingSolution (90-113)
Deeploy/TilingExtension/TilerExtension.py (3)
Deeploy/TilingExtension/TilingCodegen.py (3)
  • AbsoluteHyperRectangle (32-42)
  • HyperRectangle (17-28)
  • computeTileHyperRectangles (238-277)
Deeploy/TilingExtension/MemoryConstraints.py (3)
  • TensorMemoryConstraint (47-92)
  • PatternMemoryConstraints (170-200)
  • tensorMemoryConstraints (104-109)
Deeploy/DeeployTypes.py (1)
  • executionBlock (1593-1596)
Deeploy/TilingExtension/TileConstraint.py (3)
Deeploy/TilingExtension/MemoryConstraints.py (1)
  • NodeMemoryConstraint (95-167)
Deeploy/TilingExtension/TilingCodegen.py (3)
  • AbsoluteHyperRectangle (32-42)
  • TilingSchedule (46-115)
  • VariableReplacementScheme (119-151)
Deeploy/DeeployTypes.py (1)
  • NetworkContext (564-1076)
Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py (1)
Deeploy/TilingExtension/TileConstraint.py (2)
  • TileConstraint (16-154)
  • wrapTilingSolution (90-113)
🪛 Ruff (0.14.1)
Deeploy/TilingExtension/TilingCodegen.py

243-243: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


260-260: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


263-263: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


264-264: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


270-270: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


271-271: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

Deeploy/TilingExtension/TilerExtension.py

947-947: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


947-947: Prefer itertools.pairwise() over zip() when iterating over successive pairs

Replace zip() with itertools.pairwise()

(RUF007)


960-960: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (90)
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: linting
  • GitHub Check: deeploy-tiler-extension
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: linting
  • GitHub Check: deeploy-tiler-extension
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: linting
  • GitHub Check: deeploy-tiler-extension
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: linting
  • GitHub Check: deeploy-tiler-extension
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: linting
  • GitHub Check: deeploy-tiler-extension
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: linting
  • GitHub Check: deeploy-tiler-extension
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: linting
  • GitHub Check: deeploy-tiler-extension
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: linting
  • GitHub Check: deeploy-tiler-extension
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: linting
  • GitHub Check: deeploy-tiler-extension
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: linting
  • GitHub Check: deeploy-tiler-extension
🔇 Additional comments (6)
Deeploy/Targets/PULPOpen/DMA/L3Dma.py (1)

25-25: Confirm HAL signature for safety.

Casting ${ext} to uint32_t removes the warning, but please confirm pi_cl_ram_copy_2d indeed expects a 32‑bit external address on all targets; otherwise prefer uintptr_t.

If convenient, verify against your SDK headers.

Deeploy/DeeployTypes.py (1)

1461-1462: Document and guard “transfers” lifecycle.

Nice addition. Consider documenting when it’s set (e.g., by TilerExtension.getIoTransfers) and that callers must handle None. No code change required.

Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py (1)

137-144: Add robust guards for missing transfers and memory level.

Directly indexing baseExecutionBlock.transfers[self.targetMemLevel] will raise if transfers are not attached or the level is absent (legacy paths, non‑tiled ops, or platform differences).

Apply this minimal defensive fix:

-        tileConstr: TileConstraint = template.tileConstraint
-        transfers = {
-            tensorName: memTransfers[self.targetMemLevel]
-            for tensorName, memTransfers in baseExecutionBlock.transfers.items()
-        }
+        tileConstr: TileConstraint = template.tileConstraint
+        assert baseExecutionBlock.transfers is not None, \
+            "Missing transfers on execution block; ensure Tiler annotated transfers before variable replacement."
+        transfers = {}
+        for tensorName, memTransfers in baseExecutionBlock.transfers.items():
+            if self.targetMemLevel not in memTransfers:
+                raise KeyError(
+                    f"No transfers for memory level '{self.targetMemLevel}' on tensor '{tensorName}'. "
+                    "Verify tiler configuration and annotation."
+                )
+            transfers[tensorName] = memTransfers[self.targetMemLevel]

To ensure coverage, search that all tiling flows attach transfers:

Deeploy/TilingExtension/TilerExtension.py (2)

39-39: Import addition looks good.

The new import aligns with the new transfer computation flow.


1026-1029: Transfers annotation: confirm consumers handle non-layer-wise cases and target level selection.

executionBlock.transfers stores per-tensor, per-memory-level rectangles. Downstream, wrapTilingSolution expects a per-output mapping (by var) already pre-selected for the target level. Please confirm call sites select the correct level and handle patterns with multiple steps.

Deeploy/TilingExtension/TileConstraint.py (1)

11-14: Imports narrowed appropriately.

Only the needed types are imported; consistent with the refactor.

Comment on lines 245 to 252
tileConstraint: TileConstraint = template.tileConstraint
transfers = {
tensorName: memTransfers[self.localMemory]
for tensorName, memTransfers in baseExecutionBlock.transfers.items()
}
variableReplacement, tilingSchedules = tileConstraint.wrapTilingSolution(nodeMemoryConstraint, self.localMemory,
ctxt, unraveledOpRepr, transfers)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard transfers before use; helpful error if level missing.

Avoids hard KeyErrors when transfers aren’t annotated or the level is unavailable.

-        tileConstraint: TileConstraint = template.tileConstraint
-        transfers = {
-            tensorName: memTransfers[self.localMemory]
-            for tensorName, memTransfers in baseExecutionBlock.transfers.items()
-        }
+        tileConstraint: TileConstraint = template.tileConstraint
+        assert baseExecutionBlock.transfers is not None, \
+            "Missing transfers on execution block; ensure Tiler annotated transfers before codegen."
+        transfers = {}
+        for tensorName, memTransfers in baseExecutionBlock.transfers.items():
+            if self.localMemory not in memTransfers:
+                raise KeyError(f"No transfers for memory level '{self.localMemory}' on tensor '{tensorName}'.")
+            transfers[tensorName] = memTransfers[self.localMemory]
🤖 Prompt for AI Agents
In Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py
around lines 245 to 252, the comprehension unconditionally indexes
memTransfers[self.localMemory] which can raise KeyError when transfers aren’t
annotated or the requested memory level is missing; change this to guard access
(e.g., check self.localMemory in memTransfers or use
memTransfers.get(self.localMemory)) and build transfers only for tensors that
actually have that level, and if any expected tensor is missing the level raise
a clear error including the tensorName and the missing level (or optionally
log/collect missing entries and include them in the exception) before calling
tileConstraint.wrapTilingSolution so the failure is descriptive instead of a raw
KeyError.

Comment on lines 90 to 95
def wrapTilingSolution(
cls, tilingSolution: NodeMemoryConstraint, targetMemLevel: str, ctxt: NetworkContext,
operatorRepresentation: OperatorRepresentation) -> Tuple[VariableReplacementScheme, List[TilingSchedule]]:

def getMemoryTransfer(tensorConstraint: TensorMemoryConstraint, sourceCube: HyperRectangle,
sourceMemoryLevel: str, targetMemoryLevel: str) -> MemoryTransfer:

size = np.prod(sourceCube.dims)
sourceConstraint = MemoryConstraint(sourceMemoryLevel, size)
sourceConstraint.shape = sourceCube.dims

destConstraint = copy.copy(tensorConstraint.memoryConstraints[targetMemoryLevel])

if any(dim1 > dim2 for dim1, dim2 in zip(destConstraint.shape, sourceConstraint.shape)):
destConstraint.shape = sourceConstraint.shape

return MemoryTransfer(sourceConstraint, destConstraint)

def _offsetAdd(offsetA: Tuple[int, ...], offsetB: Tuple[int, ...]) -> Tuple[int, ...]:
return tuple(dimA + dimB for dimA, dimB in zip(offsetA, offsetB))

def getCubeTransfers(tensorConstraint: TensorMemoryConstraint, sourceCubes: List[AbsoluteHyperRectangle],
sourceMemoryLevel: str,
targetMemoryLevel: str) -> Tuple[List[AbsoluteHyperRectangle], List[int]]:
solution = []
solutionLengths = []

for sourceCube in sourceCubes:
memTransfer = getMemoryTransfer(tensorConstraint, sourceCube.rectangle, sourceMemoryLevel,
targetMemoryLevel)
solutionCubes = computeTileHyperRectangles(memTransfer)
solutionAbsoluteCubes = [
AbsoluteHyperRectangle(rectangle = cube,
absoluteOffset = _offsetAdd(sourceCube.absoluteOffset, cube.offset))
for cube in solutionCubes
]
solution += solutionAbsoluteCubes
solutionLengths.append(len(solutionAbsoluteCubes))

return solution, solutionLengths

cls, tilingSolution: NodeMemoryConstraint, targetMemLevel: str, ctxt: NetworkContext,
operatorRepresentation: OperatorRepresentation,
transfers: Dict[str,
List[List[AbsoluteHyperRectangle]]]) -> Tuple[VariableReplacementScheme, List[TilingSchedule]]:
assert len(tilingSolution.outputTensorMemoryConstraints) == 1, "Expected node to have only one output!"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate transfers structure early; clarify expected shape.

wrapTilingSolution indexes transfers[outVar], implying transfers must be var-keyed (not memory-level keyed). Add a guard and a clear error to prevent silent KeyErrors.

-        assert len(tilingSolution.outputTensorMemoryConstraints) == 1, "Expected node to have only one output!"
-        outVar, _ = next(iter(tilingSolution.outputTensorMemoryConstraints.items()))
+        assert len(tilingSolution.outputTensorMemoryConstraints) == 1, "Expected node to have only one output!"
+        outVar, _ = next(iter(tilingSolution.outputTensorMemoryConstraints.items()))
+        if outVar not in transfers:
+            raise KeyError(f"Missing transfers for output '{outVar}'. Expected var-keyed mapping.")

As per coding guidelines

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In Deeploy/TilingExtension/TileConstraint.py around lines 90 to 95,
wrapTilingSolution assumes transfers is keyed by output variable names and
shaped as Dict[str, List[List[AbsoluteHyperRectangle]]] which can cause a silent
KeyError or confusing failures; add an explicit guard that checks transfers is a
dict, that the expected output variable key (e.g., the single output variable
from tilingSolution.outputTensorMemoryConstraints) exists in transfers, and that
its value is a list of lists (and optionally non-empty) of
AbsoluteHyperRectangle; if any check fails, raise a clear ValueError or
AssertionError with a message describing the expected shape ("transfers must be
a dict keyed by output var name mapping to List[List[AbsoluteHyperRectangle]]")
and include the missing key name so callers get an immediate, informative error.

Comment on lines +966 to +971
def getIoTransfers(self,
patternMc: PatternMemoryConstraints) -> Dict[str, Dict[str, List[List[AbsoluteHyperRectangle]]]]:
assert len(patternMc.nodeConstraints) == 1, "Only layerwise supported for now!"
tMcs = patternMc.nodeConstraints[0].tensorMemoryConstraints
return {name: self.getTransfers(mc) for name, mc in tMcs.items()}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Layer-wise assertion will break multi-node patterns at runtime.

tile() calls getIoTransfers() for every pattern, but getIoTransfers() asserts a single step. This will raise on schedules where a pattern has multiple nodes. Either support multi-step or avoid asserting here.

Proposed minimal fix: default to the last step (layer-wise remains unchanged), enabling non-layer-wise schedules.

-    def getIoTransfers(self,
-                       patternMc: PatternMemoryConstraints) -> Dict[str, Dict[str, List[List[AbsoluteHyperRectangle]]]]:
-        assert len(patternMc.nodeConstraints) == 1, "Only layerwise supported for now!"
-        tMcs = patternMc.nodeConstraints[0].tensorMemoryConstraints
-        return {name: self.getTransfers(mc) for name, mc in tMcs.items()}
+    def getIoTransfers(self,
+                       patternMc: PatternMemoryConstraints) -> Dict[str, Dict[str, List[List[AbsoluteHyperRectangle]]]]:
+        # Prefer layer-wise; if not, use the last step to represent the pattern's effective IO.
+        step_idx = -1
+        tMcs = patternMc.nodeConstraints[step_idx].tensorMemoryConstraints
+        return {name: self.getTransfers(mc) for name, mc in tMcs.items()}

Based on learnings

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def getIoTransfers(self,
patternMc: PatternMemoryConstraints) -> Dict[str, Dict[str, List[List[AbsoluteHyperRectangle]]]]:
assert len(patternMc.nodeConstraints) == 1, "Only layerwise supported for now!"
tMcs = patternMc.nodeConstraints[0].tensorMemoryConstraints
return {name: self.getTransfers(mc) for name, mc in tMcs.items()}
def getIoTransfers(self,
patternMc: PatternMemoryConstraints) -> Dict[str, Dict[str, List[List[AbsoluteHyperRectangle]]]]:
# Prefer layer-wise; if not, use the last step to represent the pattern's effective IO.
step_idx = -1
tMcs = patternMc.nodeConstraints[step_idx].tensorMemoryConstraints
return {name: self.getTransfers(mc) for name, mc in tMcs.items()}
🤖 Prompt for AI Agents
In Deeploy/TilingExtension/TilerExtension.py around lines 966 to 971, the code
currently asserts that patternMc.nodeConstraints has length 1 which will raise
for multi-node patterns; remove the assert and instead select the last node
constraint when multiple steps exist (e.g., node_mc =
patternMc.nodeConstraints[-1]) then use node_mc.tensorMemoryConstraints to build
the transfers so layer-wise behavior remains for single-step but multi-step
patterns default to the last step; ensure variable names and types match the
existing code and return the same dict shape.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (1)

247-253: Guard transfers access and provide clear diagnostics when missing.

Line 247 accesses baseExecutionBlock.transfers without checking if it exists, which will raise AttributeError if the TilerExtension hasn't annotated the block. Additionally, the silent early return at lines 252–253 makes debugging difficult—users won't know why tiling was skipped.

Apply this diff to add a None check and descriptive error messages:

 tileConstr: TileConstraint = template.tileConstraint
-transfers: Dict[str, Dict[str, List[List[AbsoluteHyperRectangle]]]] = baseExecutionBlock.transfers
+transfers = baseExecutionBlock.transfers
+if transfers is None:
+    raise RuntimeError(
+        f"Missing transfers on execution block '{name}'; "
+        "ensure TilerExtension annotated transfers before TilingCodeGeneration."
+    )
+
 targetMemoryTransfers = {
-    tensorName: memTransfers.get(self.localMemory, None) for tensorName, memTransfers in transfers.items()
+    tensorName: memTransfers.get(self.localMemory) for tensorName, memTransfers in transfers.items()
 }

-if any(v is None for v in targetMemoryTransfers.values()):
-    return ctxt, executionBlock
+missingTensors = [name for name, t in targetMemoryTransfers.items() if t is None]
+if missingTensors:
+    raise RuntimeError(
+        f"No transfers for memory level '{self.localMemory}' on tensors {missingTensors} "
+        f"in execution block '{name}'."
+    )

Note: This addresses the concern raised in the past review comment about providing helpful errors when transfers or memory levels are missing.

Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py (1)

248-258: Add None‑guard for baseExecutionBlock.transfers (same issue as main pass).

Identical to the main replacement pass, line 249 assigns baseExecutionBlock.transfers without checking if it's None, then line 251 calls .items() on it.

The current .get(self.targetMemLevel, None) approach (line 251) and early return on line 254 handles missing memory level keys gracefully, which addresses part of the past review comment's concern about guarding transfers access. However, the None check on transfers itself is still missing.

Apply the same fix as in the main pass:

 tileConstr: TileConstraint = template.tileConstraint
-transfers: Dict[str, Dict[str, List[List[AbsoluteHyperRectangle]]]] = baseExecutionBlock.transfers
+transfers = baseExecutionBlock.transfers
+if transfers is None:
+    return ctxt, executionBlock
+
 targetMemoryTransfers = {
     tensorName: memTransfers.get(self.targetMemLevel, None) for tensorName, memTransfers in transfers.items()
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1973ae and 45554e2.

📒 Files selected for processing (2)
  • Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (3 hunks)
  • Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: Xeratec
PR: pulp-platform/Deeploy#105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The _legalizeTransfers function in TilingCodeGeneration.py handles conversion from elements to bytes for DMA operations when isFinalMemoryLevel is true, eliminating the need for individual DMA implementations like MchanDma to perform this conversion manually.
Learnt from: Xeratec
PR: pulp-platform/Deeploy#105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The _legalizeTransfers function in TilingCodeGeneration.py handles conversion from elements to bytes for DMA operations when isFinalMemoryLevel is true, eliminating the need for individual DMA implementations like MchanDma to perform this conversion.
Learnt from: Xeratec
PR: pulp-platform/Deeploy#105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The tiling pipeline in Deeploy handles unit conversion and normalization through functions like _legalizeTransfers, ensuring that DMA implementations receive properly formatted transfer parameters without needing to perform manual element-to-byte conversions.
🧬 Code graph analysis (2)
Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py (3)
Deeploy/TilingExtension/TileConstraint.py (2)
  • TileConstraint (16-154)
  • wrapTilingSolution (90-113)
Deeploy/TilingExtension/TilerExtension.py (1)
  • Tiler (47-970)
Deeploy/TilingExtension/TilingCodegen.py (4)
  • AbsoluteHyperRectangle (32-42)
  • TilingSchedule (46-115)
  • VariableReplacementScheme (119-151)
  • minimizeVariableReplacement (154-167)
Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (2)
Deeploy/TilingExtension/TileConstraint.py (2)
  • TileConstraint (16-154)
  • wrapTilingSolution (90-113)
Deeploy/TilingExtension/TilingCodegen.py (5)
  • AbsoluteHyperRectangle (32-42)
  • HyperRectangle (17-28)
  • TilingSchedule (46-115)
  • VariableReplacementScheme (119-151)
  • minimizeVariableReplacement (154-167)
🔇 Additional comments (2)
Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (1)

8-8: LGTM! Imports align with the new transfer-based tiling flow.

The added imports support the refactored logic that now consumes precomputed transfers from the ExecutionBlock rather than recomputing them.

Also applies to: 22-25

Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py (1)

7-7: LGTM—import additions support the transfer annotation integration.

The new imports (Dict, TileConstraint, AbsoluteHyperRectangle) are all necessary for the transfer-based tiling workflow.

Also applies to: 18-18, 20-21

Comment on lines +138 to +149
tileConstr: TileConstraint = template.tileConstraint
transfers: Dict[str, Dict[str, List[List[AbsoluteHyperRectangle]]]] = baseExecutionBlock.transfers
targetMemoryTransfers = {
tensorName: memTransfers.get(self.targetMemLevel, None) for tensorName, memTransfers in transfers.items()
}

if any(v is None for v in targetMemoryTransfers.values()):
return ctxt, executionBlock

variableReplacement, tilingSchedules = tileConstr.wrapTilingSolution(nodeMemoryConstraint, self.targetMemLevel,
ctxt, unraveledOpRepr,
targetMemoryTransfers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add None‑guard for baseExecutionBlock.transfers before accessing .items().

The type annotation declares transfers as a Dict, but if baseExecutionBlock.transfers is None, line 141 will raise an AttributeError when calling .items().

Apply this diff to add a None check:

 tileConstr: TileConstraint = template.tileConstraint
-transfers: Dict[str, Dict[str, List[List[AbsoluteHyperRectangle]]]] = baseExecutionBlock.transfers
+transfers = baseExecutionBlock.transfers
+if transfers is None:
+    return ctxt, executionBlock
+
 targetMemoryTransfers = {
     tensorName: memTransfers.get(self.targetMemLevel, None) for tensorName, memTransfers in transfers.items()
 }
🤖 Prompt for AI Agents
In Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py
around lines 138 to 149, add a None-guard for baseExecutionBlock.transfers
before calling .items(): check if baseExecutionBlock.transfers is None (or
falsy) and in that case return ctxt, executionBlock early (same as the later
guard), otherwise assign transfers = baseExecutionBlock.transfers and proceed to
build targetMemoryTransfers and call wrapTilingSolution.

@lukamac lukamac marked this pull request as draft October 23, 2025 07:12
@Xeratec Xeratec added this to the Release 0.2.1 milestone Oct 23, 2025
@Xeratec Xeratec added this to Deeploy Oct 23, 2025
@Xeratec Xeratec added the Feature Addition of new features label Oct 30, 2025
@Xeratec Xeratec moved this to In progress in Deeploy Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature Addition of new features

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

2 participants