-
Notifications
You must be signed in to change notification settings - Fork 25
Add tile transfer annotation #127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
Instead of calculating the transfer information in the wrapTilingSolution function every time for each memory level, do it once in the TilerExtension and annotate the execution block with it like with the pattern. I'm not fully satisfied with the approach, but it's a step in the right direction.
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds transfer annotations for tiled execution blocks, moves transfer computation into the Tiler, updates tiling APIs to accept precomputed transfers, removes the MemoryTransfer abstraction, and casts an external DMA pointer to uint32_t in a hardware template. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Tiler
participant Deployer as TilerDeployerWrapper
participant ExecBlock as ExecutionBlock
participant TileConstraint
participant CodeGen
Tiler->>Tiler: computeTileHyperRectangles(externalShape, localShape)
Tiler->>Tiler: getTransfers(tensorMc) → transfers per-tensor
Tiler->>Tiler: getIoTransfers(patternMc) → io_transfers (per-output → per-target → list-of-rects)
Deployer->>Tiler: getIoTransfers(pattern)
Tiler-->>Deployer: io_transfers
Deployer->>ExecBlock: executionBlock.transfers = io_transfers
ExecBlock->>TileConstraint: wrapTilingSolution(..., transfers)
TileConstraint-->>CodeGen: VariableReplacementScheme + TilingSchedule[]
Note right of TileConstraint: now consumes precomputed transfers\ninstead of computing memory-path transfers internally
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
Deeploy/TilingExtension/TilingCodegen.py (1)
238-276: Prevent div‑by‑zero and make zips explicit; minor robustness.Add positive‑dimension asserts; make
zip(..., strict=True)explicit; castnp.prodtoint.def computeTileHyperRectangles(externalShape: Tuple[int, ...], localShape: Tuple[int, ...]) -> List[HyperRectangle]: assert len(externalShape) == len(localShape), \ f"External and local memory shapes don't have the same number of dimensions! External {externalShape} vs. Local {localShape}" - # LMACAN: The local shape dimensions are of the local buffer so if the external tile is smaller, that's fine - localShape = tuple(min(ext, loc) for ext, loc in zip(externalShape, localShape)) + # LMACAN: The local shape dimensions are of the local buffer so if the external tile is smaller, that's fine + localShape = tuple(min(ext, loc) for ext, loc in zip(externalShape, localShape, strict=True)) + # Sanity: all dims must be > 0 to avoid division by zero below. + assert all(d > 0 for d in externalShape), f"externalShape must be > 0, got {externalShape}" + assert all(d > 0 for d in localShape), f"localShape must be > 0, got {localShape}" @@ - def nextTileIndex(tileIndexEnd: List[int]) -> Generator[List[int]]: - tileCount = np.prod(tileIndexEnd) + def nextTileIndex(tileIndexEnd: List[int]) -> Generator[List[int]]: + tileCount = int(np.prod(tileIndexEnd)) tileIndex = [0] * len(tileIndexEnd) for _ in range(tileCount): yield tileIndex - for dimIdx, (idx, end) in enumerate(zip(tileIndex, tileIndexEnd)): + for dimIdx, (idx, end) in enumerate(zip(tileIndex, tileIndexEnd, strict=True)): if idx + 1 < end: tileIndex[dimIdx] = idx + 1 break else: tileIndex[dimIdx] = 0 @@ - tileIndexEnd = [ - int(np.ceil(dimSizeLarge / dimSizeSmall)) for dimSizeLarge, dimSizeSmall in zip(externalShape, localShape) - ] + tileIndexEnd = [int(np.ceil(dimSizeLarge / dimSizeSmall)) + for dimSizeLarge, dimSizeSmall in zip(externalShape, localShape, strict=True)] for tileIndex in nextTileIndex(tileIndexEnd): - tileOffset = tuple(dimIdx * dimSizeSmall for dimIdx, dimSizeSmall in zip(tileIndex, localShape)) - for dimIdx, (dimOffset, dimSizeLarge) in enumerate(zip(tileOffset, externalShape)): + tileOffset = tuple(dimIdx * dimSizeSmall for dimIdx, dimSizeSmall in zip(tileIndex, localShape, strict=True)) + for dimIdx, (dimOffset, dimSizeLarge) in enumerate(zip(tileOffset, externalShape, strict=True)): assert dimOffset >= 0, f"tileOffset[{dimIdx}] shoud not be smaller then zero ({dimOffset} < 0)" assert dimOffset < dimSizeLarge, f"tileOffset[{dimIdx}] should not be bigger or equal then largeShape[{dimIdx}] ({dimOffset} >= {dimSizeLarge})" tileSize = tuple( min(dimSizeSmall, dimSizeLarge - dimOffset) - for dimSizeSmall, dimSizeLarge, dimOffset in zip(localShape, externalShape, tileOffset)) - for dimIdx, (dimSize, dimSizeSmall) in enumerate(zip(tileSize, localShape)): + for dimSizeSmall, dimSizeLarge, dimOffset in zip(localShape, externalShape, tileOffset, strict=True)) + for dimIdx, (dimSize, dimSizeSmall) in enumerate(zip(tileSize, localShape, strict=True)): assert dimSize > 0, f"tileOffset[{dimIdx}] shoud not be smaller or equal then zero ({dimSize} <= 0)" assert dimSize <= dimSizeSmall, f"tileSize[{dimIdx}] should not be bigger then smallShape[{dimIdx}] ({dimSize} > {dimSizeSmall})"Deeploy/TilingExtension/TileConstraint.py (1)
101-108: Empty transfers handling to avoid IndexError.If transfers[outVar] is empty, varReplacements[0] will throw. Fail fast with a clear message.
- for _outputCubes in transfers[outVar]: + cubes = transfers[outVar] + if not cubes: + raise ValueError(f"No transfers provided for output '{outVar}'.") + for _outputCubes in cubes: varReplacement, tilingSchedule = cls.serializeTilingSolution(tilingSolution, _outputCubes, targetMemLevel, ctxt, operatorRepresentation) sanitizedTilingSchedule = cls.sanitizeTilingSchedule(tilingSchedule) varReplacements.append(varReplacement) tilingSchedules.append(sanitizedTilingSchedule)
🧹 Nitpick comments (3)
CHANGELOG.md (1)
78-79: Name the refactored helper correctly.The function is
computeTileHyperRectanglesin code; changelog says “computeTilingRectangles”. Align wording.-- Refactored computeTilingRectangles +- Refactored computeTileHyperRectanglesDeeploy/TilingExtension/TilerExtension.py (1)
944-965: Make transfers construction safer and clearer (pairwise, strict zip, shape asserts).
- Use itertools.pairwise for successive memory levels.
- Add shape assertions for localMc.shape to avoid None usage.
- Use strict=True in zip of offsets to catch rank mismatches (Python 3.10+).
+from itertools import pairwise @@ - def getTransfers(self, tensorMc: TensorMemoryConstraint) -> Dict[str, List[List[AbsoluteHyperRectangle]]]: + def getTransfers(self, tensorMc: TensorMemoryConstraint) -> Dict[str, List[List[AbsoluteHyperRectangle]]]: transfers: Dict[str, List[List[AbsoluteHyperRectangle]]] = {} - mcs = list(tensorMc.memoryConstraints.items()) - for (externalMemory, externalMc), (localMemory, localMc) in zip(mcs[:-1], mcs[1:]): + mcs = list(tensorMc.memoryConstraints.items()) + for (externalMemory, externalMc), (localMemory, localMc) in pairwise(mcs): # TODO: Should we also use externalMemory as a key in the transfers? if externalMemory not in transfers: assert externalMc.shape is not None shape = externalMc.shape zeroOffset = (0,) * len(shape) externalAbsoluteRectangles = [AbsoluteHyperRectangle(HyperRectangle(zeroOffset, shape), zeroOffset)] else: # Flatten externalAbsoluteRectangles = [rect for _list in transfers[externalMemory] for rect in _list] + assert localMc.shape is not None, f"Missing local shape for {localMemory}" transfers[localMemory] = [[ - AbsoluteHyperRectangle(rect, tuple(a + b - for a, b in zip(extAbsRect.absoluteOffset, rect.offset))) - for rect in computeTileHyperRectangles(extAbsRect.rectangle.dims, localMc.shape) + AbsoluteHyperRectangle( + rect, + tuple(a + b for a, b in zip(extAbsRect.absoluteOffset, rect.offset, strict=True)) + ) + for rect in computeTileHyperRectangles(extAbsRect.rectangle.dims, localMc.shape) ] for extAbsRect in externalAbsoluteRectangles] return transfersIf Python < 3.10, drop strict=True or guard it. Also consider a short docstring describing the nested return structure. Based on learnings
Deeploy/TilingExtension/TileConstraint.py (1)
90-95: Optional: simplify API and type to match usage.Since you assert one output and only iterate transfers[outVar], consider passing a plain List[List[AbsoluteHyperRectangle]] (already level-selected) and drop var-keying here. This reduces coupling to external mapping structure.
- transfers: Dict[str, List[List[AbsoluteHyperRectangle]]]) -> Tuple[VariableReplacementScheme, List[TilingSchedule]]: + cubes: List[List[AbsoluteHyperRectangle]]) -> Tuple[VariableReplacementScheme, List[TilingSchedule]]: @@ - outVar, _ = next(iter(tilingSolution.outputTensorMemoryConstraints.items())) - varReplacements = [] + outVar, _ = next(iter(tilingSolution.outputTensorMemoryConstraints.items())) + varReplacements = [] @@ - for _outputCubes in transfers[outVar]: + for _outputCubes in cubes:Coordinate with callers accordingly. Based on learnings
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
CHANGELOG.md(4 hunks)Deeploy/DeeployTypes.py(1 hunks)Deeploy/Targets/PULPOpen/DMA/L3Dma.py(1 hunks)Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py(2 hunks)Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py(3 hunks)Deeploy/TilingExtension/TileConstraint.py(2 hunks)Deeploy/TilingExtension/TilerExtension.py(3 hunks)Deeploy/TilingExtension/TilingCodegen.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (1)
Deeploy/TilingExtension/TileConstraint.py (2)
TileConstraint(16-154)wrapTilingSolution(90-113)
Deeploy/TilingExtension/TilerExtension.py (3)
Deeploy/TilingExtension/TilingCodegen.py (3)
AbsoluteHyperRectangle(32-42)HyperRectangle(17-28)computeTileHyperRectangles(238-277)Deeploy/TilingExtension/MemoryConstraints.py (3)
TensorMemoryConstraint(47-92)PatternMemoryConstraints(170-200)tensorMemoryConstraints(104-109)Deeploy/DeeployTypes.py (1)
executionBlock(1593-1596)
Deeploy/TilingExtension/TileConstraint.py (3)
Deeploy/TilingExtension/MemoryConstraints.py (1)
NodeMemoryConstraint(95-167)Deeploy/TilingExtension/TilingCodegen.py (3)
AbsoluteHyperRectangle(32-42)TilingSchedule(46-115)VariableReplacementScheme(119-151)Deeploy/DeeployTypes.py (1)
NetworkContext(564-1076)
Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py (1)
Deeploy/TilingExtension/TileConstraint.py (2)
TileConstraint(16-154)wrapTilingSolution(90-113)
🪛 Ruff (0.14.1)
Deeploy/TilingExtension/TilingCodegen.py
243-243: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
260-260: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
263-263: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
264-264: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
270-270: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
271-271: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
Deeploy/TilingExtension/TilerExtension.py
947-947: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
947-947: Prefer itertools.pairwise() over zip() when iterating over successive pairs
Replace zip() with itertools.pairwise()
(RUF007)
960-960: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (90)
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
- GitHub Check: linting
- GitHub Check: deeploy-tiler-extension
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
- GitHub Check: linting
- GitHub Check: deeploy-tiler-extension
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
- GitHub Check: linting
- GitHub Check: deeploy-tiler-extension
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
- GitHub Check: linting
- GitHub Check: deeploy-tiler-extension
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
- GitHub Check: linting
- GitHub Check: deeploy-tiler-extension
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
- GitHub Check: linting
- GitHub Check: deeploy-tiler-extension
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
- GitHub Check: linting
- GitHub Check: deeploy-tiler-extension
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
- GitHub Check: linting
- GitHub Check: deeploy-tiler-extension
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
- GitHub Check: linting
- GitHub Check: deeploy-tiler-extension
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
- GitHub Check: linting
- GitHub Check: deeploy-tiler-extension
🔇 Additional comments (6)
Deeploy/Targets/PULPOpen/DMA/L3Dma.py (1)
25-25: Confirm HAL signature for safety.Casting
${ext}touint32_tremoves the warning, but please confirmpi_cl_ram_copy_2dindeed expects a 32‑bit external address on all targets; otherwise preferuintptr_t.If convenient, verify against your SDK headers.
Deeploy/DeeployTypes.py (1)
1461-1462: Document and guard “transfers” lifecycle.Nice addition. Consider documenting when it’s set (e.g., by
TilerExtension.getIoTransfers) and that callers must handleNone. No code change required.Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py (1)
137-144: Add robust guards for missing transfers and memory level.Directly indexing
baseExecutionBlock.transfers[self.targetMemLevel]will raise if transfers are not attached or the level is absent (legacy paths, non‑tiled ops, or platform differences).Apply this minimal defensive fix:
- tileConstr: TileConstraint = template.tileConstraint - transfers = { - tensorName: memTransfers[self.targetMemLevel] - for tensorName, memTransfers in baseExecutionBlock.transfers.items() - } + tileConstr: TileConstraint = template.tileConstraint + assert baseExecutionBlock.transfers is not None, \ + "Missing transfers on execution block; ensure Tiler annotated transfers before variable replacement." + transfers = {} + for tensorName, memTransfers in baseExecutionBlock.transfers.items(): + if self.targetMemLevel not in memTransfers: + raise KeyError( + f"No transfers for memory level '{self.targetMemLevel}' on tensor '{tensorName}'. " + "Verify tiler configuration and annotation." + ) + transfers[tensorName] = memTransfers[self.targetMemLevel]To ensure coverage, search that all tiling flows attach transfers:
Deeploy/TilingExtension/TilerExtension.py (2)
39-39: Import addition looks good.The new import aligns with the new transfer computation flow.
1026-1029: Transfers annotation: confirm consumers handle non-layer-wise cases and target level selection.executionBlock.transfers stores per-tensor, per-memory-level rectangles. Downstream, wrapTilingSolution expects a per-output mapping (by var) already pre-selected for the target level. Please confirm call sites select the correct level and handle patterns with multiple steps.
Deeploy/TilingExtension/TileConstraint.py (1)
11-14: Imports narrowed appropriately.Only the needed types are imported; consistent with the refactor.
| tileConstraint: TileConstraint = template.tileConstraint | ||
| transfers = { | ||
| tensorName: memTransfers[self.localMemory] | ||
| for tensorName, memTransfers in baseExecutionBlock.transfers.items() | ||
| } | ||
| variableReplacement, tilingSchedules = tileConstraint.wrapTilingSolution(nodeMemoryConstraint, self.localMemory, | ||
| ctxt, unraveledOpRepr, transfers) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard transfers before use; helpful error if level missing.
Avoids hard KeyErrors when transfers aren’t annotated or the level is unavailable.
- tileConstraint: TileConstraint = template.tileConstraint
- transfers = {
- tensorName: memTransfers[self.localMemory]
- for tensorName, memTransfers in baseExecutionBlock.transfers.items()
- }
+ tileConstraint: TileConstraint = template.tileConstraint
+ assert baseExecutionBlock.transfers is not None, \
+ "Missing transfers on execution block; ensure Tiler annotated transfers before codegen."
+ transfers = {}
+ for tensorName, memTransfers in baseExecutionBlock.transfers.items():
+ if self.localMemory not in memTransfers:
+ raise KeyError(f"No transfers for memory level '{self.localMemory}' on tensor '{tensorName}'.")
+ transfers[tensorName] = memTransfers[self.localMemory]🤖 Prompt for AI Agents
In Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py
around lines 245 to 252, the comprehension unconditionally indexes
memTransfers[self.localMemory] which can raise KeyError when transfers aren’t
annotated or the requested memory level is missing; change this to guard access
(e.g., check self.localMemory in memTransfers or use
memTransfers.get(self.localMemory)) and build transfers only for tensors that
actually have that level, and if any expected tensor is missing the level raise
a clear error including the tensorName and the missing level (or optionally
log/collect missing entries and include them in the exception) before calling
tileConstraint.wrapTilingSolution so the failure is descriptive instead of a raw
KeyError.
Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py
Show resolved
Hide resolved
| def wrapTilingSolution( | ||
| cls, tilingSolution: NodeMemoryConstraint, targetMemLevel: str, ctxt: NetworkContext, | ||
| operatorRepresentation: OperatorRepresentation) -> Tuple[VariableReplacementScheme, List[TilingSchedule]]: | ||
|
|
||
| def getMemoryTransfer(tensorConstraint: TensorMemoryConstraint, sourceCube: HyperRectangle, | ||
| sourceMemoryLevel: str, targetMemoryLevel: str) -> MemoryTransfer: | ||
|
|
||
| size = np.prod(sourceCube.dims) | ||
| sourceConstraint = MemoryConstraint(sourceMemoryLevel, size) | ||
| sourceConstraint.shape = sourceCube.dims | ||
|
|
||
| destConstraint = copy.copy(tensorConstraint.memoryConstraints[targetMemoryLevel]) | ||
|
|
||
| if any(dim1 > dim2 for dim1, dim2 in zip(destConstraint.shape, sourceConstraint.shape)): | ||
| destConstraint.shape = sourceConstraint.shape | ||
|
|
||
| return MemoryTransfer(sourceConstraint, destConstraint) | ||
|
|
||
| def _offsetAdd(offsetA: Tuple[int, ...], offsetB: Tuple[int, ...]) -> Tuple[int, ...]: | ||
| return tuple(dimA + dimB for dimA, dimB in zip(offsetA, offsetB)) | ||
|
|
||
| def getCubeTransfers(tensorConstraint: TensorMemoryConstraint, sourceCubes: List[AbsoluteHyperRectangle], | ||
| sourceMemoryLevel: str, | ||
| targetMemoryLevel: str) -> Tuple[List[AbsoluteHyperRectangle], List[int]]: | ||
| solution = [] | ||
| solutionLengths = [] | ||
|
|
||
| for sourceCube in sourceCubes: | ||
| memTransfer = getMemoryTransfer(tensorConstraint, sourceCube.rectangle, sourceMemoryLevel, | ||
| targetMemoryLevel) | ||
| solutionCubes = computeTileHyperRectangles(memTransfer) | ||
| solutionAbsoluteCubes = [ | ||
| AbsoluteHyperRectangle(rectangle = cube, | ||
| absoluteOffset = _offsetAdd(sourceCube.absoluteOffset, cube.offset)) | ||
| for cube in solutionCubes | ||
| ] | ||
| solution += solutionAbsoluteCubes | ||
| solutionLengths.append(len(solutionAbsoluteCubes)) | ||
|
|
||
| return solution, solutionLengths | ||
|
|
||
| cls, tilingSolution: NodeMemoryConstraint, targetMemLevel: str, ctxt: NetworkContext, | ||
| operatorRepresentation: OperatorRepresentation, | ||
| transfers: Dict[str, | ||
| List[List[AbsoluteHyperRectangle]]]) -> Tuple[VariableReplacementScheme, List[TilingSchedule]]: | ||
| assert len(tilingSolution.outputTensorMemoryConstraints) == 1, "Expected node to have only one output!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate transfers structure early; clarify expected shape.
wrapTilingSolution indexes transfers[outVar], implying transfers must be var-keyed (not memory-level keyed). Add a guard and a clear error to prevent silent KeyErrors.
- assert len(tilingSolution.outputTensorMemoryConstraints) == 1, "Expected node to have only one output!"
- outVar, _ = next(iter(tilingSolution.outputTensorMemoryConstraints.items()))
+ assert len(tilingSolution.outputTensorMemoryConstraints) == 1, "Expected node to have only one output!"
+ outVar, _ = next(iter(tilingSolution.outputTensorMemoryConstraints.items()))
+ if outVar not in transfers:
+ raise KeyError(f"Missing transfers for output '{outVar}'. Expected var-keyed mapping.")As per coding guidelines
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In Deeploy/TilingExtension/TileConstraint.py around lines 90 to 95,
wrapTilingSolution assumes transfers is keyed by output variable names and
shaped as Dict[str, List[List[AbsoluteHyperRectangle]]] which can cause a silent
KeyError or confusing failures; add an explicit guard that checks transfers is a
dict, that the expected output variable key (e.g., the single output variable
from tilingSolution.outputTensorMemoryConstraints) exists in transfers, and that
its value is a list of lists (and optionally non-empty) of
AbsoluteHyperRectangle; if any check fails, raise a clear ValueError or
AssertionError with a message describing the expected shape ("transfers must be
a dict keyed by output var name mapping to List[List[AbsoluteHyperRectangle]]")
and include the missing key name so callers get an immediate, informative error.
| def getIoTransfers(self, | ||
| patternMc: PatternMemoryConstraints) -> Dict[str, Dict[str, List[List[AbsoluteHyperRectangle]]]]: | ||
| assert len(patternMc.nodeConstraints) == 1, "Only layerwise supported for now!" | ||
| tMcs = patternMc.nodeConstraints[0].tensorMemoryConstraints | ||
| return {name: self.getTransfers(mc) for name, mc in tMcs.items()} | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Layer-wise assertion will break multi-node patterns at runtime.
tile() calls getIoTransfers() for every pattern, but getIoTransfers() asserts a single step. This will raise on schedules where a pattern has multiple nodes. Either support multi-step or avoid asserting here.
Proposed minimal fix: default to the last step (layer-wise remains unchanged), enabling non-layer-wise schedules.
- def getIoTransfers(self,
- patternMc: PatternMemoryConstraints) -> Dict[str, Dict[str, List[List[AbsoluteHyperRectangle]]]]:
- assert len(patternMc.nodeConstraints) == 1, "Only layerwise supported for now!"
- tMcs = patternMc.nodeConstraints[0].tensorMemoryConstraints
- return {name: self.getTransfers(mc) for name, mc in tMcs.items()}
+ def getIoTransfers(self,
+ patternMc: PatternMemoryConstraints) -> Dict[str, Dict[str, List[List[AbsoluteHyperRectangle]]]]:
+ # Prefer layer-wise; if not, use the last step to represent the pattern's effective IO.
+ step_idx = -1
+ tMcs = patternMc.nodeConstraints[step_idx].tensorMemoryConstraints
+ return {name: self.getTransfers(mc) for name, mc in tMcs.items()}Based on learnings
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def getIoTransfers(self, | |
| patternMc: PatternMemoryConstraints) -> Dict[str, Dict[str, List[List[AbsoluteHyperRectangle]]]]: | |
| assert len(patternMc.nodeConstraints) == 1, "Only layerwise supported for now!" | |
| tMcs = patternMc.nodeConstraints[0].tensorMemoryConstraints | |
| return {name: self.getTransfers(mc) for name, mc in tMcs.items()} | |
| def getIoTransfers(self, | |
| patternMc: PatternMemoryConstraints) -> Dict[str, Dict[str, List[List[AbsoluteHyperRectangle]]]]: | |
| # Prefer layer-wise; if not, use the last step to represent the pattern's effective IO. | |
| step_idx = -1 | |
| tMcs = patternMc.nodeConstraints[step_idx].tensorMemoryConstraints | |
| return {name: self.getTransfers(mc) for name, mc in tMcs.items()} |
🤖 Prompt for AI Agents
In Deeploy/TilingExtension/TilerExtension.py around lines 966 to 971, the code
currently asserts that patternMc.nodeConstraints has length 1 which will raise
for multi-node patterns; remove the assert and instead select the last node
constraint when multiple steps exist (e.g., node_mc =
patternMc.nodeConstraints[-1]) then use node_mc.tensorMemoryConstraints to build
the transfers so layer-wise behavior remains for single-step but multi-step
patterns default to the last step; ensure variable names and types match the
existing code and return the same dict shape.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (1)
247-253: Guardtransfersaccess and provide clear diagnostics when missing.Line 247 accesses
baseExecutionBlock.transferswithout checking if it exists, which will raiseAttributeErrorif the TilerExtension hasn't annotated the block. Additionally, the silent early return at lines 252–253 makes debugging difficult—users won't know why tiling was skipped.Apply this diff to add a None check and descriptive error messages:
tileConstr: TileConstraint = template.tileConstraint -transfers: Dict[str, Dict[str, List[List[AbsoluteHyperRectangle]]]] = baseExecutionBlock.transfers +transfers = baseExecutionBlock.transfers +if transfers is None: + raise RuntimeError( + f"Missing transfers on execution block '{name}'; " + "ensure TilerExtension annotated transfers before TilingCodeGeneration." + ) + targetMemoryTransfers = { - tensorName: memTransfers.get(self.localMemory, None) for tensorName, memTransfers in transfers.items() + tensorName: memTransfers.get(self.localMemory) for tensorName, memTransfers in transfers.items() } -if any(v is None for v in targetMemoryTransfers.values()): - return ctxt, executionBlock +missingTensors = [name for name, t in targetMemoryTransfers.items() if t is None] +if missingTensors: + raise RuntimeError( + f"No transfers for memory level '{self.localMemory}' on tensors {missingTensors} " + f"in execution block '{name}'." + )Note: This addresses the concern raised in the past review comment about providing helpful errors when transfers or memory levels are missing.
Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py (1)
248-258: Add None‑guard forbaseExecutionBlock.transfers(same issue as main pass).Identical to the main replacement pass, line 249 assigns
baseExecutionBlock.transferswithout checking if it'sNone, then line 251 calls.items()on it.The current
.get(self.targetMemLevel, None)approach (line 251) and early return on line 254 handles missing memory level keys gracefully, which addresses part of the past review comment's concern about guarding transfers access. However, the None check ontransfersitself is still missing.Apply the same fix as in the main pass:
tileConstr: TileConstraint = template.tileConstraint -transfers: Dict[str, Dict[str, List[List[AbsoluteHyperRectangle]]]] = baseExecutionBlock.transfers +transfers = baseExecutionBlock.transfers +if transfers is None: + return ctxt, executionBlock + targetMemoryTransfers = { tensorName: memTransfers.get(self.targetMemLevel, None) for tensorName, memTransfers in transfers.items() }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py(3 hunks)Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: Xeratec
PR: pulp-platform/Deeploy#105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The _legalizeTransfers function in TilingCodeGeneration.py handles conversion from elements to bytes for DMA operations when isFinalMemoryLevel is true, eliminating the need for individual DMA implementations like MchanDma to perform this conversion manually.
Learnt from: Xeratec
PR: pulp-platform/Deeploy#105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The _legalizeTransfers function in TilingCodeGeneration.py handles conversion from elements to bytes for DMA operations when isFinalMemoryLevel is true, eliminating the need for individual DMA implementations like MchanDma to perform this conversion.
Learnt from: Xeratec
PR: pulp-platform/Deeploy#105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The tiling pipeline in Deeploy handles unit conversion and normalization through functions like _legalizeTransfers, ensuring that DMA implementations receive properly formatted transfer parameters without needing to perform manual element-to-byte conversions.
🧬 Code graph analysis (2)
Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py (3)
Deeploy/TilingExtension/TileConstraint.py (2)
TileConstraint(16-154)wrapTilingSolution(90-113)Deeploy/TilingExtension/TilerExtension.py (1)
Tiler(47-970)Deeploy/TilingExtension/TilingCodegen.py (4)
AbsoluteHyperRectangle(32-42)TilingSchedule(46-115)VariableReplacementScheme(119-151)minimizeVariableReplacement(154-167)
Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (2)
Deeploy/TilingExtension/TileConstraint.py (2)
TileConstraint(16-154)wrapTilingSolution(90-113)Deeploy/TilingExtension/TilingCodegen.py (5)
AbsoluteHyperRectangle(32-42)HyperRectangle(17-28)TilingSchedule(46-115)VariableReplacementScheme(119-151)minimizeVariableReplacement(154-167)
🔇 Additional comments (2)
Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (1)
8-8: LGTM! Imports align with the new transfer-based tiling flow.The added imports support the refactored logic that now consumes precomputed transfers from the ExecutionBlock rather than recomputing them.
Also applies to: 22-25
Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py (1)
7-7: LGTM—import additions support the transfer annotation integration.The new imports (
Dict,TileConstraint,AbsoluteHyperRectangle) are all necessary for the transfer-based tiling workflow.Also applies to: 18-18, 20-21
| tileConstr: TileConstraint = template.tileConstraint | ||
| transfers: Dict[str, Dict[str, List[List[AbsoluteHyperRectangle]]]] = baseExecutionBlock.transfers | ||
| targetMemoryTransfers = { | ||
| tensorName: memTransfers.get(self.targetMemLevel, None) for tensorName, memTransfers in transfers.items() | ||
| } | ||
|
|
||
| if any(v is None for v in targetMemoryTransfers.values()): | ||
| return ctxt, executionBlock | ||
|
|
||
| variableReplacement, tilingSchedules = tileConstr.wrapTilingSolution(nodeMemoryConstraint, self.targetMemLevel, | ||
| ctxt, unraveledOpRepr, | ||
| targetMemoryTransfers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add None‑guard for baseExecutionBlock.transfers before accessing .items().
The type annotation declares transfers as a Dict, but if baseExecutionBlock.transfers is None, line 141 will raise an AttributeError when calling .items().
Apply this diff to add a None check:
tileConstr: TileConstraint = template.tileConstraint
-transfers: Dict[str, Dict[str, List[List[AbsoluteHyperRectangle]]]] = baseExecutionBlock.transfers
+transfers = baseExecutionBlock.transfers
+if transfers is None:
+ return ctxt, executionBlock
+
targetMemoryTransfers = {
tensorName: memTransfers.get(self.targetMemLevel, None) for tensorName, memTransfers in transfers.items()
}🤖 Prompt for AI Agents
In Deeploy/TilingExtension/CodeTransformationPasses/TilingVariableReplacement.py
around lines 138 to 149, add a None-guard for baseExecutionBlock.transfers
before calling .items(): check if baseExecutionBlock.transfers is None (or
falsy) and in that case return ctxt, executionBlock early (same as the later
guard), otherwise assign transfers = baseExecutionBlock.transfers and proceed to
build targetMemoryTransfers and call wrapTilingSolution.
This PR adds another tiling-related annotation to ExtensionBlocks, the tile transfer information, i.e., the size of tiles to transfer in each step of the tiling loop. This was previously (re)generated in the wrapTilingSolution class function of TileConstraints and it was kinda confusing because it had to generate the whole memory path until the current target each time.
Instead, I've calculate it once in the TilerExtension and annotate the executionBlock the same way as with the PatternMemoryConstraint.
_Note: I'm not in love with this executionBlock annotation approach, but want to keep this PR short, so I'll leave that for later.
Another tiling change was to
computeTilingHyperRectangleswhich is a helper function to calculates all the tiles that need to be executed on if we had a tensor in the external memory (think bigger memory) that needs to be processed in the local memory (think smaller memory) in a certain tensor shape.Previously this function would accept the
MemoryTransferwhich is just a dataclass that contains a source (big) and destination (small) memory constraints. The constraints were basically just used for their shapes and actually the source constraint's shape had to be adjusted each time before calling the function which confused me even more.This way it's much clearer what is needed and that we don't need the actual memory constraints but just shapes.
One thing that I'm a little bit unsure is that previously, if any of the dimensions of the external memory tensors shape was smaller then the local ones, we would replace the local shape with the external one. I haven't copied this behavior fully, but instead always chose the minimum value between the local and external shape's. In the distant future where we have some more advanced tiling with different memory levels having different tile sizes, this should probably be revisited, but for now I think it's good enough.
I also added casting of the external pointer to
uint32_tin the L3Dma to fix compilation warnings.Added
Changed
Fixed
uint32_tPR Merge Checklist
develcommit and pointing todevel.CHANGELOG.mdfile has been updated.