Skip to content

Conversation

@diaconuccalin
Copy link
Contributor

@diaconuccalin diaconuccalin commented Sep 24, 2025

This PR brings the changes required for a working minimal TinyViT on the Siracusa platform, without tiling.

Added

  • PULP 2D FP DW conv Im2Col template and kernel, with bias support.
  • Bias support for PULP 2D FP regular conv Im2Col in template & kernel.
  • PULP FP DW conv 2D parser.
  • FP conv 2D (simple & DW), reshape & skip connection, and TinyViT demo tests to the non-tiled Siracusa CI pipeline.
  • FP bindings and mappings for PULP slice, DW conv 2D, and reduce mean operations.
  • FP PULP DW conv lowering optimization pass similar to the existent one for integer version.
  • RemoveEmptyConvBiasPass to the PULP optimizer.
  • PULPClusterEngine now accepts a n_cores parameter to set the number of cores used
  • annotateNCores method to PULPDeployer that adds an n_cores key to all PULPClusterEngine templates' operatorRepresentations

Changed

  • Reduced size of reshape & skip connection test, for non-tiled Siracusa memory compatibility.
  • changed _mapNode to _selectEngine which reduces the responsibility of that function to, as the name states, just engine selection

Fixed

  • Fixed bug for non-batched elements in the PULPOpen FP GEMM and matmul templates.
  • Added underscore to the beginning of closure names to avoid naming issues when they start with unsupported first characters (like numbers).
  • Data types in the PULPOpen FP add and mul templates.

PR Merge Checklist

  1. The PR is rebased on the latest devel commit and pointing to devel.
  2. Your PR reviewed and approved.
  3. All checks are passing.
  4. The CHANGELOG.md file has been updated.
  5. If the docker was modified, change back its link after review.

@diaconuccalin diaconuccalin self-assigned this Sep 24, 2025
@diaconuccalin diaconuccalin added the Feature Addition of new features label Sep 24, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 24, 2025

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • Added depthwise 2D convolution operations with optional bias support
    • Added slice operation support for PULPOpen target
    • Bias support for 2D convolution kernels
    • Configurable core count for multi-core cluster execution
    • Extended reduce mean operations
  • Bug Fixes

    • Fixed batching issues in floating-point GEMM and MatMul operations
    • Fixed closure naming conflicts for special characters
    • Corrected data types in floating-point arithmetic operations
    • Improved reshape buffer aliasing
  • Tests

    • Expanded Siracusa CI pipeline with additional kernel and model test variants

Walkthrough

Adds PULP FP depthwise 2D convolution (Im2Col) with bias support across templates, kernels, parsers, bindings and mappings; introduces slice bindings; adds n_cores propagation to PULP cluster engines and deployer; refactors engine-selection helpers (_mapNode → _selectEngine); updates templates/kernels types and Siracusa CI tests.

Changes

Cohort / File(s) Summary
CI workflow
\.github/workflows/ci-platform-siracusa.yml
Added Siracusa test names for FP conv (regular & DW) variants, bias/no-bias cases, reshape-with-skip, and TinyViT demo.
Engine selection refactor
Deeploy/DeeployTypes.py, Deeploy/CommonExtensions/NetworkDeployers/NetworkDeployerWrapper.py, Deeploy/EngineExtension/NetworkDeployers/EngineColoringDeployer.py
Renamed _mapNode_selectEngine, changed return to DeploymentEngine, updated call sites to obtain engine and invoke engine.Mapping[node.op](node) where required.
Closure naming
Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
Prefix closure names with underscore to avoid names starting with unsupported characters.
DW conv templates & kernels
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py, Deeploy/Targets/PULPOpen/Templates/*, TargetLibraries/PULPOpen/inc/kernel/Conv.h, TargetLibraries/PULPOpen/src/Convolution_fp32.c, TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
Added PULP2DFloatDWConvIm2ColTemplate, dynamic transient buffer sizing/hoisting, added bias parameters to conv APIs, implemented PULP_DW_Conv2d_Im2Col_fp32 kernel and bias-aware per-core logic.
Parsers, bindings, platform mappings
Deeploy/Targets/PULPOpen/Parsers.py, Deeploy/Targets/PULPOpen/Bindings.py, Deeploy/Targets/PULPOpen/Platform.py
Added PULPFPDWConv2DParser, PULPFloatDWConv2DBindings, PULPSliceBindings; created FPDWConv2DMapper, DMASliceMapper, SliceMapper; updated PULPMapping (Conv and Slice entries); added RemoveEmptyConvBiasPass.
Slice & ReduceMean extensions
Deeploy/Targets/PULPOpen/Bindings.py, Deeploy/Targets/PULPOpen/Platform.py
Added PULPSliceBindings; switched DMASlice binding to DMASliceTemplate.referenceTemplate; extended PULPReduceMeanBindings for float+integer combos.
Reshape template changes
Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py
Now inherits from _GenericReshapeTemplate, uses VariableBuffer lookup for bufferIn/bufferOut, and sets bufferOut._alias = bufferIn.name instead of dead-marking indices.
Template type & logic fixes
Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py, FloatMulTemplate.py, FloatGemmTemplate.py, FloatMatMulTemplate.py
Switched core/index types to unsigned (uint8_t/uint32_t), expanded unroll (1x3→1x6), added MIN casts; guarded batched-pointer updates in GEMM/MatMul templates.
Data types tuple fix
Deeploy/CommonExtensions/DataTypes.py
IntegerDataTypes now a flat tuple(sorted(...)) instead of a one-element tuple wrapping a list.
Deployer & cluster engine core propagation
Deeploy/Targets/PULPOpen/Deployer.py, DeeployTest/testMVP.py, DeeployTest/generateNetwork.py, DeeployTest/testUtils/testRunner.py
Added annotateNCores() to propagate n_cores into operator representations; bind() now annotates before base bind; added --cores CLI flag and wiring; TestRunner appends --cores for Siracusa platforms.
Bindings public API updates
Deeploy/Targets/PULPOpen/Bindings.py
Added public bindings PULPSliceBindings and PULPFloatDWConv2DBindings; updated imports for new templates/types.
CHANGELOG
CHANGELOG.md
Documented DW conv/Im2Col, bias support, bindings/mappers, tests, type/template fixes, and core propagation.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as Test Runner / Caller
    participant Deployer as PULPDeployer
    participant Engine as PULPClusterEngine
    participant OpRep as OperatorRepresentation

    CLI->>Deployer: run/generate (--cores)
    Deployer->>Engine: annotate n_cores on engine(s)
    Engine-->>Deployer: n_cores set
    Deployer->>OpRep: inject n_cores into operator contexts
    Deployer->>Deployer: super().bind() (uses annotated contexts)
Loading
sequenceDiagram
    participant Caller as Host
    participant DWKernel as PULP_DW_Conv2d_Im2Col_fp32
    participant Core as per-core compute
    participant Ctxt as pContextBuffer/im2col

    Caller->>DWKernel: call(pSrcA,pSrcB,pSrcBias,has_bias,pDstC,pContextBuffer,pads...)
    DWKernel->>Core: partition outputs by core (ch_out_start/ch_out_stop)
    Core->>Ctxt: select per-core im2col buffer
    Core->>Core: fill im2col, MAC over weights
    alt has_bias
        Core->>Core: add bias to outputs
    end
    Core->>Caller: write outputs to pDstC
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Areas to focus on:

  • DW convolution kernel and per-core partitioning (TargetLibraries/.../DWConvolution_fp32.c, Convolution_fp32.c).
  • Engine-selection refactor consistency and call-site updates (Deeploy/DeeployTypes.py, NetworkDeployerWrapper.py, EngineColoringDeployer.py).
  • Template buffer hoisting and transient buffer sizing propagation (FloatConvTemplate.py).
  • Bindings/parsers alignment with kernel signatures and mapper registration (Bindings.py, Parsers.py, Platform.py).
  • Deployer annotateNCores logic and CLI wiring (Deployer.py, testMVP.py, generateNetwork.py, testRunner.py).

Possibly related PRs

Suggested labels

Refactor

Suggested reviewers

  • Victor-Jung
  • Xeratec

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main objective: enabling TinyViT on the non-tiled Siracusa platform, which is the primary focus of all changes across templates, kernels, parsers, bindings, and CI tests.
Description check ✅ Passed The description comprehensively relates to the changeset, detailing all major additions (DW conv templates/kernels, bias support, parsers, CI tests, bindings, optimizations, n_cores parameter), changes, and fixes that align with the file modifications.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)

344-346: Prevent KeyError when 'group' attr is absent

Depthwise detection assumes 'group' exists. If missing, this pass will crash. Use a safe default (1) as elsewhere in the file.

-    if opNode.attrs['group'] == 1:
-        return graph
+    group = opNode.attrs.get('group', 1)
+    if group == 1:
+        return graph
Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1)

16-23: Loop index type should match buffer indexing

Use unsigned 32-bit for i to match array indexing and avoid UB on large sizes.

(Handled in the diff above.)

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)

131-149: DW Im2Col arg order — swap X/Y for image dims, kernel dims and strides; bias placement is correct

  • Prototype (TargetLibraries/PULPOpen/inc/kernel/Conv.h): PULP_DW_Conv2d_Im2Col(..., uint32_t H, uint32_t W, uint32_t C, ..., uint32_t P, uint32_t Q, uint32_t SP, uint32_t SQ, const float32_t *pSrcBias, const bool has_bias, float32_t *pDstC, uint32_t pad_top, uint32_t pad_bottom, uint32_t pad_left, uint32_t pad_right, float32_t *pContextBuffer).
  • Template call (Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py lines 131–149) currently passes (dim_im_in_x, dim_im_in_y), (dim_kernel_x, dim_kernel_y), (stride_x, stride_y). These should be swapped to (dim_im_in_y, dim_im_in_x), (dim_kernel_y, dim_kernel_x), (stride_y, stride_x).
  • (bias, has_bias), padding order and context buffer position are already correct.
🧹 Nitpick comments (16)
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)

10-11: Avoid libm at runtime: use integer log2 for NUM_CORES

Replace double log2() with __builtin_ctz() to avoid libm dependency and ensure compile‑time integer computation. Also assert the power‑of‑two assumption that the chunking math relies on.

Apply this diff:

-uint32_t ${nodeName}_log2Core = (uint32_t) log2(NUM_CORES);
+// NUM_CORES must be a power of two; compute log2(NUM_CORES) without libm
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L
+_Static_assert((NUM_CORES & (NUM_CORES - 1)) == 0, "NUM_CORES must be power of two");
+#endif
+uint32_t ${nodeName}_log2Core = (uint32_t)__builtin_ctz(NUM_CORES);
.github/workflows/ci-platform-siracusa.yml (1)

98-98: TinyViT demo test: validate runtime envelope on Siracusa

Great to see model‑level coverage. Consider job timeout safeguards if this grows. For now, please confirm the demo’s memory/latency fit the non‑tiled target on the selected runner.

Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1)

11-21: LGTM: per-iteration A/B selection handles mixed batched/unbatched inputs.

This fixes the non-batched handling cleanly and mirrors expected strides (A: MN, B: NO). Consider applying the same per-iteration pointer style in GEMM for consistency, but not blocking.

Deeploy/DeeployTypes.py (3)

320-324: Prefer None over mutable default and let ctor initialize alias_of.

Passing alias_of = [] creates a new list here (safe), but use None consistently and centralize initialization in the constructor to avoid future in-place mutations leaking via shared references.

-        return (cls(
-            name = node.name,
-            shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape,
-            alias_of = [],
-        ))
+        return cls(
+            name = node.name,
+            shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape,
+            alias_of = None,
+        )

469-474: Avoid sharing alias list between buffers; copy it.

Passing buffer.alias_of by reference couples future mutations between the source and new ConstantBuffer.

-        ret = cls(
+        ret = cls(
             name = buffer.name,
             shape = buffer.shape,
             values = values,
-            alias_of = buffer.alias_of,
+            alias_of = list(buffer.alias_of),
         )

1176-1180: Pass alias_of=None and let the ctor initialize.

Keeps alias handling consistent and avoids accidental shared lists if ctor defaults are ever reintroduced.

-                nb = ctxt.VariableBuffer(
-                    name = name,
-                    shape = node.shape,
-                    alias_of = [],
-                )
+                nb = ctxt.VariableBuffer(
+                    name = name,
+                    shape = node.shape,
+                    alias_of = None,
+                )
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (3)

137-141: Remove unnecessary modulo and precompute local weight pointer inside inner loop

im2col_idx % (P * Q) is redundant. Precompute a per-filter base pointer for tighter inner loops.

Apply this diff in both bias and non-bias paths:

-            for (uint32_t im2col_idx = 0; im2col_idx < P * Q; im2col_idx++) {
-              sum +=
-                  im2col_buffer[im2col_idx] *
-                  weight_ptr[(f - ch_out_start) * P * Q + im2col_idx % (P * Q)];
-            }
+            const float32_t *local_w = weight_ptr + (f - ch_out_start) * (P * Q);
+            for (uint32_t k = 0; k < P * Q; k++) {
+              sum += im2col_buffer[k] * local_w[k];
+            }

Also applies to: 236-240


47-48: Remove unused variable kernel_size

kernel_size is computed but never used.

-  uint32_t kernel_size = P * Q * F_total;

61-97: Consider simplifying padding handling with a single p/q nested loop

You can mirror the im2col fill used in the regular Im2Col kernel (check Convolution_fp32.c), avoiding four separate padding loops and the type-casting hazards.

TargetLibraries/PULPOpen/inc/kernel/Conv.h (1)

20-27: Document pContextBuffer size expectations and bias semantics

The new signatures look correct. Please add brief comments indicating required pContextBuffer sizes:

  • Conv2d_Im2Col: NUM_CORES * (CPQ) floats
  • DW_Conv2d_Im2Col: NUM_CORES * (P*Q) floats
    Also note that has_bias governs whether pSrcBias is read.

Also applies to: 29-36

Deeploy/Targets/PULPOpen/Bindings.py (2)

239-245: Constrain FP DWConv binding to supported float type(s).

FloatConvTemplate/referenceDW2DIm2ColTemplate likely only implements float32. Avoid advertising unsupported float types by narrowing the generator.

Apply this diff:

-        ForkTransformer) for float_type in FloatDataTypes
+        ForkTransformer) for float_type in [float32_t]

299-302: ReduceMean (float): restrict axes integer type to int32.

ONNX axes are int64/int32; templates and parsers typically operate on int32 here. Generating bindings for all signed integer widths adds binder churn without benefit.

Apply this diff:

-    for integer_type in SignedIntegerDataTypes
-    for float_type in FloatDataTypes
+    for integer_type in [int32_t]
+    for float_type in FloatDataTypes
Deeploy/Targets/PULPOpen/Platform.py (1)

108-108: Depthwise vs regular FP Conv: prevent mapper shadowing by ordering.

If PULPFPConv2DParser accepts DW cases, FPConv2DMapper may shadow FPDWConv2DMapper due to precedence.

Apply this diff to prioritize DWConv first:

-    'Conv': ConvLayer([FPConv2DMapper, FPDWConv2DMapper]),
+    'Conv': ConvLayer([FPDWConv2DMapper, FPConv2DMapper]),

Alternatively, ensure PULPFPConv2DParser explicitly rejects depthwise.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)

42-51: Fix return typing and silence ARG004 (unused ctxt)

  • Return type now returns a str for size; widen the annotation to reflect this.
  • Rename ctxt to _ctxt to address Ruff ARG004.

Apply:

-    def computeTransientBuffersSize(ctxt: NetworkContext,
-                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]:
+    def computeTransientBuffersSize(_ctxt: NetworkContext,
+                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, str, IntVar]]]:

93-93: Typo in comment: “ChannelOout” → “ChannelOut”

Minor doc polish.

-// 2D FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})

124-126: Typo in comment: “ChannelOout” → “ChannelOut”

Same typo in DW template header.

-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2fa3c31 and de2752b.

📒 Files selected for processing (17)
  • .github/workflows/ci-platform-siracusa.yml (3 hunks)
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1 hunks)
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2 hunks)
  • Deeploy/DeeployTypes.py (4 hunks)
  • Deeploy/Targets/Generic/Parsers.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Bindings.py (5 hunks)
  • Deeploy/Targets/PULPOpen/Parsers.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Platform.py (7 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (4 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1 hunks)
  • TargetLibraries/PULPOpen/inc/DeeployPULPMath.h (1 hunks)
  • TargetLibraries/PULPOpen/inc/kernel/Conv.h (1 hunks)
  • TargetLibraries/PULPOpen/src/Convolution_fp32.c (4 hunks)
  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (2)
  • add_aliases (326-345)
  • lookup (763-795)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
  • NodeTemplate (82-224)
  • computeTransientBuffersSize (137-157)
  • NetworkContext (554-1063)
  • hoistTransientBuffers (160-180)
  • hoistTransientBuffer (864-883)
  • lookup (763-795)
TargetLibraries/PULPOpen/inc/kernel/Conv.h (2)
TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
  • PULP_Conv2d_fp32_fp32_fp32_HWC (10-102)
  • PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC (104-219)
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
  • PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC (10-251)
Deeploy/Targets/PULPOpen/Bindings.py (3)
Deeploy/CommonExtensions/DataTypes.py (4)
  • float32_t (74-78)
  • int8_t (12-15)
  • int32_t (24-27)
  • uint8_t (36-39)
Deeploy/Targets/Generic/TypeCheckers.py (3)
  • SliceChecker (37-51)
  • ConvChecker (456-474)
  • ReduceMeanChecker (310-324)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (522-545)
Deeploy/Targets/PULPOpen/Platform.py (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
  • RemoveEmptyConvBiasPass (667-679)
Deeploy/Targets/PULPOpen/Parsers.py (1)
  • PULPFPDWConv2DParser (122-188)
Deeploy/DeeployTypes.py (1)
  • NodeMapper (1695-1838)
Deeploy/Targets/Generic/Parsers.py (1)
  • SliceParser (97-155)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
  • inputs (2492-2509)
  • lookup (763-795)
  • parseNode (1076-1091)
  • parseNodeCtxt (1094-1119)
  • NetworkContext (554-1063)
Deeploy/Targets/Generic/Parsers.py (31)
  • Conv2DParser (1182-1248)
  • parseNode (19-27)
  • parseNode (49-58)
  • parseNode (81-94)
  • parseNode (102-112)
  • parseNode (163-169)
  • parseNode (193-207)
  • parseNode (229-251)
  • parseNode (291-304)
  • parseNode (326-338)
  • parseNode (375-387)
  • parseNode (422-426)
  • parseNode (449-460)
  • parseNode (484-495)
  • parseNode (531-535)
  • parseNode (551-554)
  • parseNodeCtxt (29-41)
  • parseNodeCtxt (60-76)
  • parseNodeCtxt (114-155)
  • parseNodeCtxt (171-185)
  • parseNodeCtxt (209-221)
  • parseNodeCtxt (253-283)
  • parseNodeCtxt (306-318)
  • parseNodeCtxt (340-367)
  • parseNodeCtxt (389-414)
  • parseNodeCtxt (428-441)
  • parseNodeCtxt (462-476)
  • parseNodeCtxt (497-523)
  • parseNodeCtxt (537-543)
  • parseNodeCtxt (556-573)
  • parseNodeCtxt (586-605)
🪛 Ruff (0.13.1)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py

43-43: Unused static method argument: ctxt

(ARG004)

Deeploy/DeeployTypes.py

433-433: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


433-433: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


433-433: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
🔇 Additional comments (23)
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)

13-14: LGTM: unsigned arithmetic for start/end bounds

Casting ${size} to uint32_t aligns types and prevents signed/unsigned min mixups. Indices and loop vars are consistently uint32_t.

.github/workflows/ci-platform-siracusa.yml (2)

75-75: Reshape + skip connection test: ensure alias_of fix is covered

Good addition to exercise the alias/buffer lifetime fix. Please confirm this test indeed hits the alias_of scenario (non‑tiled) and fails without the fix.


61-64: CI workflow lists new DW conv FP tests but those tests aren't registered in the repo

.github/workflows/ci-platform-siracusa.yml (lines 61–64) lists testFloat2DDWConvolution, testFloat2DDWConvolutionBias, testFloat2DDWConvolutionZeroBias but a repo search found no matches — add/register these tests or update the workflow. After they exist, verify the shapes fit non-tiled Siracusa memory and that the zero-bias test exercises the RemoveEmptyConvBias pass.

Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (2)

27-37: Good: batch-gated A/B/C strides prevent OOB on non-batched inputs.

Looks correct and aligns with the matmul change. Please confirm the strides match the in-memory layout for transposed cases (i.e., still MN for A and NO for B even when transA/transB are set).


39-39: Unconditional out stride can desync when data_out aliases C but C is not batched.

If data_out may alias C while C is reused (non-batched), advancing only the out pointer will diverge the pair across iterations and yield incorrect reads/writes. Either:

  • enforce at mapping time that alias-of(C, data_out) implies identical batched flags, or
  • gate the out advance with an explicit data_out_batched flag.

Example if such a flag is available:

-    ref_${data_out}_${data_out} += ${M} * ${O};
+    % if data_out_batched:
+    ref_${data_out}_${data_out} += ${M} * ${O};
+    % endif
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)

370-374: Verify FP DW Conv layout: input-only transpose may be insufficient

For FP DW Conv, you transpose only the input and not the weights. Confirm kernel expectations: if weights are stored in channels-first layout, they likely also need DW-specific permutation (as above) to avoid semantic mismatch when setting channels_first to default.


385-403: New PULPDWConvPass wiring looks good

Pattern graph + partial replacement is clear and consistent with the pipeline.


404-426: New PULPFPDWConvPass wiring looks good

Extends NHWC conversion to FP DW conv as intended.


505-505: Pass ordering LGTM

Adding PULPFPDWConvPass after PULPDWConvPass in the sequence is reasonable and non-overlapping.

TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)

41-43: Confirm context buffer sizing per core

This implementation expects pContextBuffer sized to NUM_CORES * (P*Q) floats. Please confirm templates/allocations provide that.

TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)

10-16: LGTM on signature extension and core partitioning

Bias handling and per-core channel slicing are consistent. Weight and output indexing are correct.

Please ensure all call sites updated to pass pSrcBias, has_bias, and pContextBuffer (for the Im2Col variant).

Also applies to: 18-39


139-176: Good reuse of im2col buffer and contiguous weight blocks

The inner product uses a contiguous kernel_size slice, matching the buffer layout. Looks correct.

Also applies to: 179-218

Deeploy/Targets/PULPOpen/Parsers.py (3)

104-116: Bias handling looks correct; maps optional third input and flags has_bias

Parsing and context mapping are consistent with the new kernels.

Confirm downstream templates use operatorRepresentation["has_bias"] and ["bias"] as introduced here.


127-159: DW FP2D parser shape/padding checks look reasonable

Depthwise gating via inputs count and padding symmetry is fine.


185-186: Depthwise condition check aligns with ONNX semantics

group == ch_im_in is the expected DW condition.

Deeploy/Targets/PULPOpen/Bindings.py (2)

143-153: DMA Slice binding switch looks good.

Good move to DMASliceTemplate with MemoryAwareForkTransformer for integer tensors; aligns with the DMA path introduced in Platform.


155-165: Float Slice mapping: validate starts/ends/axes/steps typing.

Using uint8_t pointers for starts/ends/axes/steps is consistent with the DMA variant, but please verify SliceTemplate.referenceTemplate expects raw byte buffers for these (not int32_t). Also confirm this path covers only float inputs while integers fall back to DMA.

Deeploy/Targets/PULPOpen/Platform.py (4)

77-79: FP DWConv mapper wiring: LGTM.

Parser and bindings are correctly introduced and exposed.


92-95: Two Slice mappers ordering: LGTM; confirm fallback behavior.

Order ensures float Slice binds first; integers should fall back to DMA. Please confirm CI covers both float and int slice paths.


8-9: RemoveEmptyConvBiasPass import: LGTM.

Consistent with optimizer usage below.


235-236: Optimizer: adding RemoveEmptyConvBiasPass is sensible.

Placement at the end of the topology passes is fine as long as it runs pre-binding. If the intent is to normalize bias presence for FP convs (so templates always see 3 inputs), consider placing it before conv-related merges; otherwise current placement is acceptable.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)

60-62: Good: transient buffer typed to input element

Ensures the generated C sees the correct pointee type instead of void. This helps alloc/codegen correctness.


81-81: Bias args correct — signature matches PULP_Conv2d_fp32_fp32_fp32_HWC

TargetLibraries/PULPOpen/src/Convolution_fp32.c and TargetLibraries/PULPOpen/inc/kernel/Conv.h declare pSrcBias, has_bias in that exact position before pDstC/padding, so adding ${bias}, ${has_bias} is correct.

@Xeratec Xeratec added this to the Release 0.2.1 milestone Sep 29, 2025
@Xeratec Xeratec added the Bug Something isn't working label Sep 29, 2025
@Xeratec Xeratec moved this to Need Reviewer in Deeploy Sep 29, 2025
@Xeratec Xeratec moved this from Need Reviewer to In review in Deeploy Sep 29, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
Deeploy/DeeployTypes.py (1)

241-241: Fix remaining mutable default arguments.

VariableBuffer.__init__ still has mutable defaults (shape = [1], alias_of: Optional[List[str]] = []). These should use immutable defaults (tuples or None) to follow Python best practices and avoid shared mutable state across calls.

Apply this diff:

-    def __init__(self, name: str = '', shape = [1], alias_of: Optional[List[str]] = []):
+    def __init__(self, name: str = '', shape: Sequence[int] = (1,), alias_of: Optional[List[str]] = None):

The defensive copy on line 260 already handles the None case correctly, so no further changes are needed there.

Based on past review feedback and static analysis (Ruff B006).

🧹 Nitpick comments (2)
Deeploy/DeeployTypes.py (2)

454-454: Optional: Avoid redundant defensive copy.

Since VariableBuffer.__init__ already defensively copies alias_of (line 260), passing list(alias_of) here results in double copying. You can simplify by passing the list directly:

-        super().__init__(name, shape, list(alias_of) if alias_of is not None else None)
+        super().__init__(name, shape, alias_of)

This is a minor optimization and doesn't affect correctness.


490-495: Optional: Make alias propagation more explicit.

While correct (the constructor defensively copies alias_of), explicitly copying here would make the intent clearer:

     ret = cls(
         name = buffer.name,
         shape = buffer.shape,
         values = values,
-        alias_of = buffer.alias_of,
+        alias_of = list(buffer.alias_of),  # Explicit copy for clarity
     )

This is a minor clarity improvement and doesn't affect correctness.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce7e4c8 and 4165983.

📒 Files selected for processing (2)
  • CHANGELOG.md (4 hunks)
  • Deeploy/DeeployTypes.py (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
Deeploy/DeeployTypes.py (1)
DeeployTest/testUtils/dmaUtils.py (1)
  • size (72-73)
🪛 Ruff (0.13.3)
Deeploy/DeeployTypes.py

452-452: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


452-452: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🔇 Additional comments (4)
Deeploy/DeeployTypes.py (4)

325-329: LGTM!

Correctly initializes alias_of with an explicit empty list for each new instance, avoiding shared mutable state.


362-362: LGTM!

Correctly returns a defensive copy of the internal alias list, preventing external code from accidentally mutating the buffer's state.


404-406: LGTM!

The updates to support size as Union[int, str] are consistent across TransientBuffer.__init__, _bufferRepresentation, and hoistTransientBuffer. This enables symbolic size expressions, which aligns with the PR's tiling and optimization goals.

Also applies to: 431-431, 888-888


1200-1204: LGTM!

Correctly initializes non-global outputs with an explicit empty alias_of list, maintaining consistency with the alias tracking implementation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
Deeploy/DeeployTypes.py (2)

241-261: Fix mutable defaults in VariableBuffer.init (shape/alias_of)

Defaulting shape to [1] and alias_of to [] risks shared state; use immutable defaults and copy incoming lists.

Apply:

-    def __init__(self, name: str = '', shape = [1], alias_of: Optional[List[str]] = []):
+    def __init__(self, name: str = '', shape: Sequence[int] = (1,), alias_of: Optional[List[str]] = None):
         self.name: str = name  #: str: Canonical name that this buffer is registered as in the NetworkContext
-        self.shape: Sequence[
-            int] = shape  #: Sequence[int]: Represents the dimensions of the underlying tensor as a sequence of dimension sizes
+        self.shape: Sequence[int] = shape  #: Sequence[int]: Represents the dimensions of the underlying tensor as a sequence of dimension sizes
@@
-        self.alias_of: List[str] = list(alias_of) if alias_of is not None else []
+        self.alias_of: List[str] = list(alias_of) if alias_of is not None else []

As per Ruff B006.


452-456: Fix mutable defaults in ConstantBuffer.init (shape/values)

Use immutable defaults and forward alias_of safely.

Apply:

-    def __init__(self, name: str = '', shape = [1], values = [0], alias_of: Optional[List[str]] = None):
-        # Pass a copy of alias_of to avoid shared references
-        super().__init__(name, shape, list(alias_of) if alias_of is not None else None)
+    def __init__(self,
+                 name: str = '',
+                 shape: Sequence[int] = (1,),
+                 values: Union[Sequence, np.ndarray] = (0,),
+                 alias_of: Optional[List[str]] = None):
+        super().__init__(name, shape, alias_of)

(values processing via np.asarray below remains correct.)

🧹 Nitpick comments (4)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)

21-24: Im2Col buffer bytes should derive from input element width (not weight).

The im2col buffer stores input elements; computing bytes via weight_type may work today (fp32), but couples unrelated types and can break if they diverge. Prefer data_in_type width.

Apply:

-        im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8) * ctxt.n_cores * operatorRepresentation[
-            'ch_im_in'] * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']
+        im2col_dim = (operatorRepresentation["data_in_type"].referencedType.typeWidth // 8) \
+                     * ctxt.n_cores \
+                     * operatorRepresentation['ch_im_in'] \
+                     * operatorRepresentation['dim_kernel_x'] \
+                     * operatorRepresentation['dim_kernel_y']

43-51: Fix return type annotation and stale comment; ensure consistent sizing semantics.

  • Return type says List[Tuple[str, str]] but the code returns int; align with base signature (Union[int, IntVar]).
  • Comment says “value is only used as string”, but an int is produced. Update comment and/or value.

Also, consider returning element count (NUM_CORES * Kx * Ky) and relying on the buffer’s referencedType to derive bytes, for consistency with the established pattern.

-    def computeTransientBuffersSize(ctxt: NetworkContext,
-                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]:
+    def computeTransientBuffersSize(ctxt: NetworkContext,
+                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, IntVar]]]:
@@
-        # Memory allocation for the im2col buffer can be dynamic, based on the number of cores.
-        # WARNING: This works because value is only used as string, in the allocate template.
-        im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8
-                     ) * ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']
+        # Im2Col buffer scales with number of cores and kernel area; size below is in ELEMENTS.
+        # Byte size is derived from the buffer's referenced type at allocation/codegen time.
+        im2col_dim = ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']

Note: If you prefer to keep bytes here, update the comment accordingly and see the next comment on double-accounting risk.


124-154: DW Im2Col call wiring is consistent; minor docstring typo.

The DW Im2Col invocation mirrors the FP path with bias and ctxtBuffer. Typo: “ChannelOout” → “ChannelOut”.

-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})
Deeploy/DeeployTypes.py (1)

2708-2714: Preserve context name when recreating NetworkContext during parse

Recreating self.ctxt drops custom names; pass name=self.name to keep mangling consistent.

Apply:

-        self.ctxt = NetworkContext(
+        self.ctxt = NetworkContext(
             variableBuffer = self.Platform.VariableBuffer,
             constantBuffer = self.Platform.ConstantBuffer,
             structBuffer = self.Platform.StructBuffer,
             transientBuffer = self.Platform.TransientBuffer,
-            n_cores = self.ctxt.n_cores,
+            n_cores = self.ctxt.n_cores,
+            name = self.name,
         )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4165983 and af78d75.

📒 Files selected for processing (9)
  • Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (2 hunks)
  • Deeploy/DeeployTypes.py (15 hunks)
  • Deeploy/Targets/PULPOpen/Deployer.py (2 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (5 hunks)
  • Deeploy/TilingExtension/TilerExtension.py (1 hunks)
  • DeeployTest/testMVP.py (4 hunks)
  • DeeployTest/testRunner_tiled_siracusa.py (1 hunks)
  • DeeployTest/testUtils/platformMapping.py (2 hunks)
  • DeeployTest/testUtils/testRunner.py (3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (4)
DeeployTest/testMVP.py (3)
Deeploy/DeeployTypes.py (1)
  • NetworkDeployer (3256-3633)
DeeployTest/testUtils/platformMapping.py (1)
  • mapDeployer (92-261)
DeeployTest/testSchedulingExtension.py (1)
  • _mockScheduler (29-33)
DeeployTest/testRunner_tiled_siracusa.py (1)
DeeployTest/testUtils/testRunner.py (1)
  • TestRunner (279-450)
DeeployTest/testUtils/platformMapping.py (2)
Deeploy/DeeployTypes.py (3)
  • DeploymentPlatform (2451-2494)
  • TopologyOptimizer (2249-2278)
  • NetworkDeployer (3256-3633)
Deeploy/Targets/PULPOpen/Deployer.py (1)
  • PULPDeployer (29-128)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
  • NodeTemplate (87-229)
  • computeTransientBuffersSize (142-162)
  • NetworkContext (575-1090)
  • hoistTransientBuffers (165-185)
  • hoistTransientBuffer (891-910)
  • lookup (790-822)
🪛 GitHub Actions: CI • Snitch (Tiled)
DeeployTest/testUtils/testRunner.py

[error] 334-334: AttributeError: 'TestRunner' object has no attribute '_cores'. Did you mean: 'n_cores'?


[error] 334-334: Command failed with exit code 1. Failure occurred while executing testRunner_tiled_snitch.py with parameters: -t Tests/$testName --cores=9 --simulator=$simulator --l1 $L1_value --defaultMemLevel=L2 --memAllocStrategy=MiniMalloc --searchStrategy=random-max

🪛 Ruff (0.13.3)
DeeployTest/testMVP.py

58-58: Unused function argument: verbose

(ARG001)

Deeploy/Targets/PULPOpen/Deployer.py

40-40: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Deeploy/DeeployTypes.py

452-452: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


452-452: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py

25-25: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (150)
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
🔇 Additional comments (6)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)

81-84: Bias/has_bias plumbing for direct conv looks good.

Bias flags are correctly passed to the FP Conv2d HWC call.


100-117: Im2Col call argument order (X,Y) acknowledged as correct for this kernel.

The X,Y ordering for dims, kernel, and stride matches the intended prototype in this codebase.

If you changed any kernel prototypes elsewhere in this PR, ensure consistency across integer/float and DW/non-DW variants.

Deeploy/TilingExtension/TilerExtension.py (1)

987-989: LGTM: propagate core count to context

Assigning ctxt.n_cores ensures downstream kernels see the configured core count.

DeeployTest/testRunner_tiled_siracusa.py (1)

20-26: LGTM: cores threaded into TestRunner and CMake

Passing args.cores to TestRunner and exporting NUM_CORES is correct.

Also applies to: 28-28

DeeployTest/testUtils/platformMapping.py (1)

92-104: LGTM: add n_cores and forward to PULPDeployer

Signature and forwarding align with deployer changes.

Also applies to: 214-224

DeeployTest/testMVP.py (1)

57-60: LGTM: introduce and propagate n_cores

CLI option, setupDeployer signature, and forwarding to mapDeployer are consistent.

Also applies to: 84-93, 201-206, 261-263

Copy link
Collaborator

@lukamac lukamac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a first round of review. There is already a few comments that need your answers. I haven't managed to look at the actual templates/kernels. I'll do that tomorrow.
Good work :)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
Deeploy/Targets/PULPOpen/Deployer.py (1)

31-53: Avoid the shared mutable default for inputOffsets

Keeping inputOffsets = {} means every deployer instance shares the same dict, so one run can mutate another’s offsets. Since this signature just changed, please switch to a safe None default and materialize a new dict per instance before calling super().

-from typing import Callable, Dict, List, Type
+from typing import Callable, Dict, List, Optional, Type
@@
-                 inputOffsets = {},
+                 inputOffsets: Optional[Dict[str, int]] = None,
                  n_cores: int = 8):
+        if inputOffsets is None:
+            inputOffsets = {}
+
         super().__init__(
             graph = graph,
             deploymentPlatform = deploymentPlatform,
             inputTypes = inputTypes,
             loweringOptimizer = loweringOptimizer,
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4db4978 and 48b2ff5.

📒 Files selected for processing (5)
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1 hunks)
  • Deeploy/DeeployTypes.py (15 hunks)
  • Deeploy/Targets/Generic/Parsers.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Deployer.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Parsers.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
🧰 Additional context used
🧬 Code graph analysis (2)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (3)
  • add_aliases (331-350)
  • get_aliases_of (352-364)
  • lookup (788-820)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
  • inputs (2579-2596)
  • lookup (788-820)
  • parseNode (1101-1116)
  • parseNodeCtxt (1119-1144)
  • NetworkContext (574-1088)
Deeploy/Targets/Generic/Parsers.py (31)
  • Conv2DParser (1204-1270)
  • parseNode (19-27)
  • parseNode (49-58)
  • parseNode (81-94)
  • parseNode (102-112)
  • parseNode (163-169)
  • parseNode (193-207)
  • parseNode (229-251)
  • parseNode (291-304)
  • parseNode (326-338)
  • parseNode (375-387)
  • parseNode (422-426)
  • parseNode (449-460)
  • parseNode (484-495)
  • parseNode (531-535)
  • parseNode (551-554)
  • parseNodeCtxt (29-41)
  • parseNodeCtxt (60-76)
  • parseNodeCtxt (114-155)
  • parseNodeCtxt (171-185)
  • parseNodeCtxt (209-221)
  • parseNodeCtxt (253-283)
  • parseNodeCtxt (306-318)
  • parseNodeCtxt (340-367)
  • parseNodeCtxt (389-414)
  • parseNodeCtxt (428-441)
  • parseNodeCtxt (462-476)
  • parseNodeCtxt (497-523)
  • parseNodeCtxt (537-543)
  • parseNodeCtxt (556-573)
  • parseNodeCtxt (586-605)
🪛 Ruff (0.14.0)
Deeploy/Targets/PULPOpen/Deployer.py

40-40: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48b2ff5 and 7cba0cf.

📒 Files selected for processing (2)
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4 hunks)
  • Deeploy/Targets/Generic/Parsers.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • Deeploy/Targets/Generic/Parsers.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-24T11:56:35.781Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py:364-0
Timestamp: 2025-09-24T11:56:35.781Z
Learning: When reviewing pull requests, focus comments only on code that was actually changed (lines with + or - in diffs), not on unchanged context code that appears in the diff.

Applied to files:

  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
🧬 Code graph analysis (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (2)
  • contextagnostic (285-298)
  • ReplaceSequentialPatternPass (265-282)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (120)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
🔇 Additional comments (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)

250-250: LGTM! Improved group attribute handling.

Using .get('group', 1) properly handles the ONNX Conv default and is more robust than explicit attribute checking.


370-373: LGTM! Correctly handles FP DW Conv input transpose.

The else block appropriately transposes the input for non-RequantizedConv DW convolutions (e.g., FP Conv), while the RequantizedConv path transposes weights. This distinction aligns with the requirements for adding FP DW Conv support.


380-426: LGTM! Well-structured pass classes with clear separation of concerns.

The refactoring of PULPDWConvPass improves readability with explicit comments and sections. The new PULPFPDWConvPass class properly mirrors the structure for FP DW Conv handling, using the 'Conv' op pattern and a distinct pass name while sharing the same replacement function via partial().


505-505: LGTM! Correctly integrates the FP DW Conv pass into the sequence.

The placement after PULPDWConvPass and before dense conv passes is logical, ensuring DW convolutions (both quantized and FP) are handled before dense convolutions.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py (2)

105-107: Compute NSize based on transA.

Current code assumes no transpose; wrong K/N leads to bad tiles.

-        NSize = buffA.shape[-1]
+        NSize = buffA.shape[-1] if transA == 0 else buffA.shape[-2]

215-223: Critical undefined identifiers remain in IntGEMMTileConstraint.serializeTilingSolution.

The verification reveals multiple undefined variables that will cause runtime errors:

  • Line 132: BSize used but never defined (should be BatchSize)
  • Lines 151, 155, 167, 171, 370, 374, 386, 390, 405, 407: BatchOffset used but never defined
  • Lines 156, 172, 375, 391, 407: BOffset used but never defined

In contrast, FloatGEMMTileConstraint.serializeTilingSolution correctly uses BatchSize at line 341, suggesting the IntGEMMTileConstraint version requires similar fixes. The undefined BatchOffset and BOffset offsets need to be extracted from cube.offset similar to how MOffset and OOffset are extracted at line 119.

TargetLibraries/PULPOpen/src/Convolution_fp32.c (1)

22-28: Avoid 16-bit overflow in channel partitioning.

Use 32-bit for ch_out_* to support large F_total; identical change recommended in both functions.

Apply:

-  uint16_t ch_out_chunk =
+  uint32_t ch_out_chunk =
       (F_total >> log2Core) + ((F_total & (NUM_CORES - 1)) != 0);
-  uint16_t ch_out_start = MIN(ch_out_chunk * core_id, F_total);
-  uint16_t ch_out_stop = MIN(ch_out_start + ch_out_chunk, F_total);
-  uint16_t ch_out_count = ch_out_stop - ch_out_start;
+  uint32_t ch_out_start = MIN(ch_out_chunk * core_id, F_total);
+  uint32_t ch_out_stop = MIN(ch_out_start + ch_out_chunk, F_total);
+  uint32_t ch_out_count = ch_out_stop - ch_out_start;

Repeat the same replacement in PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC.

Deeploy/MemoryLevelExtension/NetworkDeployers/MemoryLevelDeployer.py (1)

119-125: Mutable default list for passes.

Use None and initialize to avoid shared state across instances.

Apply:

-                 memoryLevelAnnotationPasses: List[NetworkOptimizationPass] = []):
+                 memoryLevelAnnotationPasses: List[NetworkOptimizationPass] | None = None):
@@
-        if len(memoryLevelAnnotationPasses) == 0:
+        if not memoryLevelAnnotationPasses:
             memoryLevelAnnotationPasses.append(AnnotateDefaultMemoryLevel(self.Platform.memoryHierarchy))
         self.memoryLevelAnnotationOptimizer = NetworkOptimizer(memoryLevelAnnotationPasses)

Repeat the same pattern in MemoryLevelAwareSignPropDeployer.init and MemoryDeployerWrapper.init.

♻️ Duplicate comments (5)
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)

25-26: n_cores addition looks reasonable; mutable default still present.

The n_cores parameter addition aligns with the PR's goal of supporting dynamic im2col buffer sizing. However, two observations:

  1. The mutable default inputOffsets: Dict[str, int] = {} remains unfixed from the previous review.
  2. The n_cores parameter lacks documentation and validation (e.g., ensuring it's positive).

Consider adding a docstring or inline comment explaining the n_cores parameter's purpose and optionally validate it:

                 inputOffsets: Dict[str, int] = {},
-                 n_cores: int = 8):
+                 n_cores: int = 8):  # Number of cores for im2col buffer sizing
+        if n_cores <= 0:
+            raise ValueError(f"n_cores must be positive, got {n_cores}")

For the mutable default issue, refer to the previous review comment suggesting inputOffsets: Optional[Dict[str, int]] = None.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)

43-51: Fix DW Im2Col buffer sizing: remove typeWidth multiplier and correct return type annotation

In PULP2DFloatDWConvIm2ColTemplate.computeTransientBuffersSize() (lines 43–51), the size computation multiplies by typeWidth // 8 to produce byte count, but hoistTransientBuffers() (lines 59–61) then assigns the buffer a concrete type. The allocator subsequently multiplies by sizeof(type), causing double-accounting on tight L2 budgets. Additionally, the return type annotation incorrectly declares str instead of Union[int, IntVar].

Align with the existing PULP2DFloatConvIm2ColTemplate pattern (lines 18–24): compute element count (no typeWidth division) and let type-setting handle the byte multiplication.

-    def computeTransientBuffersSize(ctxt: NetworkContext,
-                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]:
+    def computeTransientBuffersSize(ctxt: NetworkContext,
+                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, IntVar]]]:

-        im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8
-                     ) * ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']
+        im2col_dim = ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)

61-97: Critical: signed/unsigned mix in im2col padding loops leads to out‑of‑bounds indexing

h_in/w_in are uint32_t while h_in_start/w_in_start are int32_t; expressions like (h_in - h_in_start) and (w_in - w_in_start) perform unsigned arithmetic when mixed, producing incorrect large indices. Use int32_t consistently and cast bounds.

Apply (pattern applies similarly to LEFT/RIGHT blocks):

-        for (uint32_t h_in = MAX(H, h_in_start); h_in < h_in_start + P; h_in++) {
+        for (int32_t h_in = MAX((int32_t)H, h_in_start); h_in < (int32_t)(h_in_start + (int32_t)P); h_in++) {
           for (int32_t w_in = (int32_t)w_in_start;
                w_in < (int32_t)(w_in_start + Q); w_in++) {
             im2col_buffer[(h_in - h_in_start) * Q + (w_in - w_in_start)] = 0.0f;
           }
         }
-        for (uint32_t h_in = MAX(0, h_in_start); h_in < MIN(H, h_in_start + P); h_in++) {
+        for (int32_t h_in = MAX(0, h_in_start); h_in < MIN((int32_t)H, h_in_start + (int32_t)P); h_in++) {
           for (int32_t w_in = (int32_t)w_in_start;
                w_in < MIN(0, (int32_t)(w_in_start + Q)); w_in++) {
             im2col_buffer[(h_in - h_in_start) * Q + (w_in - w_in_start)] = 0.0f;
           }
         }
-        for (uint32_t h_in = MAX(0, h_in_start); h_in < MIN(H, h_in_start + P); h_in++) {
+        for (int32_t h_in = MAX(0, h_in_start); h_in < MIN((int32_t)H, h_in_start + (int32_t)P); h_in++) {
-          for (uint32_t w_in = MAX(W, w_in_start); w_in < w_in_start + Q; w_in++) {
+          for (int32_t w_in = MAX((int32_t)W, w_in_start); w_in < (int32_t)(w_in_start + (int32_t)Q); w_in++) {
             im2col_buffer[(h_in - h_in_start) * Q + (w_in - w_in_start)] = 0.0f;
           }
         }

Repeat the same signed fixes in the no‑bias branch (lines 170‑195).

Deeploy/Targets/PULPOpen/Deployer.py (1)

40-53: Avoid mutable default for inputOffsets.

Switch to Optional and init when calling super.

Apply:

-from typing import Callable, Dict, List, Type
+from typing import Callable, Dict, List, Optional, Type
@@
-                 deeployStateDir: str = "DeeployStateDir",
-                 inputOffsets = {},
+                 deeployStateDir: str = "DeeployStateDir",
+                 inputOffsets: Optional[Dict[str, int]] = None,
                  n_cores: int = 8):
         super().__init__(
@@
-            inputOffsets = inputOffsets,
+            inputOffsets = inputOffsets if inputOffsets is not None else {},
             n_cores = n_cores,
         )

If SignPropDeployer already accepts Optional[Dict[str, int]], you can pass it through directly and drop the or {}.

DeeployTest/testUtils/testRunner.py (1)

349-356: Remove duplicate command construction; keep single source of truth.

The first assignment is redundant; append -n_cores via self.n_cores once.

Apply:

-        command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform} {self.gen_args}"
-
-        command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}"
+        command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}"
         if self._tiling is True:
             command += f" -n_cores {self.n_cores}"
         command += f" {self.gen_args}"
🧹 Nitpick comments (15)
CHANGELOG.md (1)

26-32: Vary sentence structure in Added section to improve readability.

The static analysis tool flagged repetitive sentence beginnings ("FP PULP DW conv", "PULP FP DW conv", "FP bindings"). While the entries are accurate and comprehensive, consider rephrasing for better variety. For example:

- PULP 2D FP DW conv Im2Col template and kernel, with bias support.
- Bias support for PULP 2D FP regular conv Im2Col in template & kernel.
- PULP FP DW conv 2D parser.
- FP conv 2D (simple & DW), reshape & skip connection, and TinyViT demo tests to the non-tiled Siracusa CI pipeline.
- FP bindings and mappings for PULP slice, DW conv 2D, and reduce mean operations.
- FP PULP DW conv lowering optimization pass similar to the existent one for integer version.
+ 2D FP depthwise convolution Im2Col template, kernel, and parser for PULP, with bias support.
+ Bias support for PULP 2D FP regular convolution Im2Col template and kernel.
+ FP convolution 2D (regular and depthwise), reshape, skip connection, and TinyViT demo tests in the non-tiled Siracusa CI pipeline.
+ FP bindings and mappings: PULP slice, depthwise convolution 2D, and reduce mean operations.
+ Depthwise convolution lowering optimization pass (FP variant) for PULP, analogous to the integer version.

This maintains factual accuracy while improving stylistic variety.

Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py (1)

519-526: Tighten schedule pairing with zip(strict=True)

Ensure 1:1 correspondence between base input loads and requant cubes.

-        } for schedule, requant in zip(tilingSchedule.inputLoadSchedule, inputRequantCubes)]
+        } for schedule, requant in zip(tilingSchedule.inputLoadSchedule, inputRequantCubes, strict=True)]
Deeploy/Targets/Neureka/TileConstraints/NeurekaDepthwiseConstraint.py (1)

515-517: Add zip(strict=True) to guard schedule alignment

Match counts between input and requant loads for WMEM DW RQS.

-        } for schedule, requant in zip(tilingSchedule.inputLoadSchedule, inputRequantCubes)]
+        } for schedule, requant in zip(tilingSchedule.inputLoadSchedule, inputRequantCubes, strict=True)]
Deeploy/Targets/Neureka/TileConstraints/NeurekaPointwiseConstraint.py (1)

295-299: Strengthen pairing with zip(strict=True) in PW RQS

Guarantee equal lengths when combining base and requant schedules.

-        } for load, rqLoad in zip(tilingSchedule.inputLoadSchedule, requantSchedule)]
+        } for load, rqLoad in zip(tilingSchedule.inputLoadSchedule, requantSchedule, strict=True)]
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)

10-14: Avoid FP log2; use bit ops and assert power-of-two NUM_CORES

log2() pulls in libm and risks FP rounding; use CTZ and add a compile-time check. Also removes any need for <math.h>/-lm.

-uint32_t ${nodeName}_core_id = pi_core_id();
-uint32_t ${nodeName}_log2Core = (uint32_t) log2(NUM_CORES);
+uint32_t ${nodeName}_core_id = pi_core_id();
+_Static_assert((NUM_CORES & (NUM_CORES - 1)) == 0, "NUM_CORES must be a power of two");
+const uint32_t ${nodeName}_log2Core = __builtin_ctz(NUM_CORES);
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)

93-93: Fix typo in comment: “ChannelOout” → “ChannelOut”

-// 2D FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})

125-154: DW Im2Col call: argument order LGTM; minor comment typo

-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})

Based on learnings.


46-51: Clarify misleading comment about size-as-string

The value is currently computed as int, not string; please update the comment to avoid confusion.

Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)

181-185: Use wider unsigned types for tiling sizes for consistency and headroom.

Dims rarely fit in int8; GEMM uses uint16_t/uint8_t. Align MatMul for consistency and to avoid overflow.

Apply:

-from Deeploy.CommonExtensions.DataTypes import int8_t
+from Deeploy.CommonExtensions.DataTypes import uint8_t, uint16_t
...
-        replacementTypes = {
-            "M": PointerClass(int8_t),
-            "N": PointerClass(int8_t),
-            "O": PointerClass(int8_t),
-            "batch": PointerClass(int8_t)
-        }
+        replacementTypes = {
+            "M": PointerClass(uint16_t),
+            "N": PointerClass(uint16_t),
+            "O": PointerClass(uint16_t),
+            "batch": PointerClass(uint8_t),
+        }

If other MatMul code relies on int8_t here, confirm and we can gate this by target or keep uint16_t only for sizes ≥ 256.

Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py (1)

83-87: Avoid assigning unused variable.

modulus is unused. Either drop assignment or use _ =.

-            modulus = tilerModel.addTileSizeDivisibleConstraint(parseDict, 'O', BSecondDimVar, 16, prefix = "16_")
+            tilerModel.addTileSizeDivisibleConstraint(parseDict, 'O', BSecondDimVar, 16, prefix = "16_")
Deeploy/Targets/Generic/Parsers.py (1)

1071-1101: Reshape aliasing looks good; mark constant shape as non-deploy and use cached alias list

  • Alias flow is correct and symmetric; iterate the cached list to avoid relying on a second lookup.
  • If the shape input is a constant, mark it non‑live/non‑deploy to avoid useless allocations (mirrors Unsqueeze).

Apply near this block:

-        new_output_node_aliases = input_node.get_aliases_of()
+        new_output_node_aliases = list(input_node.get_aliases_of())
         new_output_node_aliases.append(input_node.name)
@@
-        for alias in output_node.get_aliases_of():
+        for alias in new_output_node_aliases:
             alias_node = ctxt.lookup(alias)
             alias_node.add_aliases(aliases_to_add = [
                 output_node.name,
             ])
+
+        # If shape is a constant tensor, exclude it from deployment
+        shape_buf = ctxt.lookup(node.inputs[inputs.index("shape")].name)
+        if hasattr(shape_buf, "values"):
+            shape_buf._live = False
+            shape_buf._deploy = False
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (2)

137-145: Minor: remove redundant modulo and consider precomputing weight base

im2col_idx runs [0, PQ), so % (PQ) is redundant. Also, precompute (f - ch_out_start)PQ outside the inner loop.

-            for (uint32_t im2col_idx = 0; im2col_idx < P * Q; im2col_idx++) {
-              sum += im2col_buffer[im2col_idx] *
-                  weight_ptr[(f - ch_out_start) * P * Q + im2col_idx % (P * Q)];
-            }
+            uint32_t w_base = (f - ch_out_start) * P * Q;
+            for (uint32_t im2col_idx = 0; im2col_idx < P * Q; im2col_idx++) {
+              sum += im2col_buffer[im2col_idx] * weight_ptr[w_base + im2col_idx];
+            }

47-51: Nit: kernel_size is unused

kernel_size is computed but never used; remove to avoid confusion.

Deeploy/DeeployTypes.py (1)

3262-3317: Redundant n_cores field in child; base already stores it.

NetworkDeployer saves n_cores; removing the extra self.n_cores avoids duplication.

Apply:

-        self.prepared = False
-        self.n_cores = n_cores
+        self.prepared = False
DeeployTest/testMVP.py (1)

57-59: Consider removing the unused verbose parameter.

The verbose parameter is declared but never used in the function body. While the n_cores parameter addition looks good and aligns with the PR objectives, consider removing verbose to clean up the signature.

Apply this diff:

 def setupDeployer(graph: gs.Graph, memoryHierarchy: MemoryHierarchy, defaultTargetMemoryLevel: MemoryLevel,
-                  defaultIoMemoryLevel: MemoryLevel, verbose: CodeGenVerbosity, args: argparse.Namespace,
-                  n_cores: int) -> Tuple[NetworkDeployer, bool]:
+                  defaultIoMemoryLevel: MemoryLevel, args: argparse.Namespace,
+                  n_cores: int) -> Tuple[NetworkDeployer, bool]:

Note: Ensure to update the call site on line 256 as well if you apply this change.

Based on learnings

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da162b0 and 85edf76.

📒 Files selected for processing (30)
  • .github/workflows/ci-platform-siracusa.yml (3 hunks)
  • CHANGELOG.md (4 hunks)
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1 hunks)
  • Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (2 hunks)
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (5 hunks)
  • Deeploy/DeeployTypes.py (14 hunks)
  • Deeploy/MemoryLevelExtension/NetworkDeployers/MemoryLevelDeployer.py (4 hunks)
  • Deeploy/Targets/Generic/Parsers.py (1 hunks)
  • Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py (1 hunks)
  • Deeploy/Targets/Neureka/TileConstraints/NeurekaDepthwiseConstraint.py (1 hunks)
  • Deeploy/Targets/Neureka/TileConstraints/NeurekaPointwiseConstraint.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Bindings.py (6 hunks)
  • Deeploy/Targets/PULPOpen/Deployer.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Parsers.py (2 hunks)
  • Deeploy/Targets/PULPOpen/Platform.py (7 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (5 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py (3 hunks)
  • Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1 hunks)
  • Deeploy/TilingExtension/TilerExtension.py (1 hunks)
  • DeeployTest/testMVP.py (4 hunks)
  • DeeployTest/testRunner_tiled_siracusa.py (1 hunks)
  • DeeployTest/testUtils/platformMapping.py (2 hunks)
  • DeeployTest/testUtils/testRunner.py (3 hunks)
  • TargetLibraries/PULPOpen/inc/kernel/Conv.h (1 hunks)
  • TargetLibraries/PULPOpen/src/Convolution_fp32.c (4 hunks)
  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py
  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py
  • Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
  • DeeployTest/testUtils/platformMapping.py
  • DeeployTest/testRunner_tiled_siracusa.py
  • TargetLibraries/PULPOpen/inc/kernel/Conv.h
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-09-24T12:53:00.168Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: TargetLibraries/PULPOpen/src/DWConvolution_fp32.c:0-0
Timestamp: 2025-09-24T12:53:00.168Z
Learning: In PULP platform code, log2 is defined in pulp_nn_utils and can be used safely without requiring math.h or causing floating-point build/runtime issues, unlike the standard library log2 function.

Applied to files:

  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
  • TargetLibraries/PULPOpen/src/Convolution_fp32.c
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-24T11:56:35.781Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py:364-0
Timestamp: 2025-09-24T11:56:35.781Z
Learning: When reviewing pull requests, focus comments only on code that was actually changed (lines with + or - in diffs), not on unchanged context code that appears in the diff.

Applied to files:

  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
📚 Learning: 2025-09-24T11:43:47.236Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.

Applied to files:

  • .github/workflows/ci-platform-siracusa.yml
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (14)
Deeploy/Targets/PULPOpen/Platform.py (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
  • RemoveEmptyConvBiasPass (587-592)
Deeploy/Targets/PULPOpen/Parsers.py (2)
  • PULPFPConv2DParser (65-120)
  • PULPFPDWConv2DParser (123-189)
Deeploy/DeeployTypes.py (1)
  • NodeMapper (1736-1892)
Deeploy/Targets/Generic/Parsers.py (1)
  • SliceParser (97-155)
Deeploy/Targets/PULPOpen/Bindings.py (5)
Deeploy/CommonExtensions/DataTypes.py (4)
  • float32_t (74-78)
  • int8_t (12-15)
  • int32_t (24-27)
  • uint8_t (36-39)
Deeploy/DeeployTypes.py (3)
  • CodeTransformation (2366-2400)
  • NodeBinding (1588-1733)
  • NodeTemplate (87-229)
Deeploy/FutureExtension/Bindings/AutoFutureBinding.py (1)
  • AutoFutureBinding (13-62)
Deeploy/Targets/Generic/TypeCheckers.py (3)
  • SliceChecker (37-51)
  • ConvChecker (456-474)
  • ReduceMeanChecker (310-324)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
  • inputs (2583-2600)
  • lookup (792-824)
  • parseNode (1105-1120)
  • parseNodeCtxt (1123-1148)
  • NetworkContext (578-1092)
Deeploy/Targets/Generic/Parsers.py (31)
  • Conv2DParser (1246-1312)
  • parseNode (19-27)
  • parseNode (49-58)
  • parseNode (81-94)
  • parseNode (102-112)
  • parseNode (163-169)
  • parseNode (193-207)
  • parseNode (229-243)
  • parseNode (271-293)
  • parseNode (333-351)
  • parseNode (373-385)
  • parseNode (422-434)
  • parseNode (469-473)
  • parseNode (496-507)
  • parseNode (531-542)
  • parseNode (578-582)
  • parseNodeCtxt (29-41)
  • parseNodeCtxt (60-76)
  • parseNodeCtxt (114-155)
  • parseNodeCtxt (171-185)
  • parseNodeCtxt (209-221)
  • parseNodeCtxt (245-263)
  • parseNodeCtxt (295-325)
  • parseNodeCtxt (353-365)
  • parseNodeCtxt (387-414)
  • parseNodeCtxt (436-461)
  • parseNodeCtxt (475-488)
  • parseNodeCtxt (509-523)
  • parseNodeCtxt (544-570)
  • parseNodeCtxt (584-590)
  • parseNodeCtxt (603-620)
Deeploy/Targets/Neureka/TileConstraints/NeurekaPointwiseConstraint.py (2)
Deeploy/DeeployTypes.py (1)
  • lookup (792-824)
Deeploy/TilingExtension/TilingCodegen.py (2)
  • HyperRectangle (24-35)
  • calculateFlatOffsetInBytes (239-242)
Deeploy/DeeployTypes.py (3)
Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (1)
  • sizeInBytes (181-182)
Deeploy/AbstractDataTypes.py (2)
  • typeWidth (399-400)
  • Pointer (303-388)
DeeployTest/testUtils/dmaUtils.py (1)
  • size (72-73)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (3)
  • lookup (792-824)
  • get_aliases_of (352-364)
  • add_aliases (331-350)
Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py (6)
Deeploy/TilingExtension/TilingCodegen.py (5)
  • TilingSchedule (53-122)
  • AbsoluteHyperRectangle (39-49)
  • VariableReplacementScheme (126-158)
  • HyperRectangle (24-35)
  • calculateFlatOffsetInBytes (239-242)
Deeploy/TilingExtension/TileConstraint.py (3)
  • TileConstraint (19-221)
  • serializeTilingSolution (184-221)
  • extractBaseAddr (56-74)
Deeploy/DeeployTypes.py (1)
  • lookup (792-824)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
Deeploy/CommonExtensions/DataTypes.py (3)
  • uint8_t (36-39)
  • uint32_t (48-51)
  • uint16_t (42-45)
Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (2)
  • Conv2DTileConstraint (225-435)
  • computeInputCube (320-347)
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)
Deeploy/TilingExtension/TilingCodegen.py (1)
  • HyperRectangle (24-35)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (3)
  • contextagnostic (285-298)
  • SequentialPass (201-210)
  • ReplaceSequentialPatternPass (265-282)
Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py (1)
Deeploy/TilingExtension/TilingCodegen.py (1)
  • HyperRectangle (24-35)
Deeploy/Targets/Neureka/TileConstraints/NeurekaDepthwiseConstraint.py (6)
Deeploy/TilingExtension/TilingCodegen.py (5)
  • TilingSchedule (53-122)
  • AbsoluteHyperRectangle (39-49)
  • VariableReplacementScheme (126-158)
  • HyperRectangle (24-35)
  • calculateFlatOffsetInBytes (239-242)
Deeploy/TilingExtension/TileConstraint.py (2)
  • TileConstraint (19-221)
  • extractBaseAddr (56-74)
Deeploy/TilingExtension/TilerModel.py (3)
  • TilerModel (34-402)
  • addTensorDimToModel (143-157)
  • getTensorDimVar (131-135)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
Deeploy/CommonExtensions/DataTypes.py (3)
  • uint8_t (36-39)
  • uint32_t (48-51)
  • uint16_t (42-45)
Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (2)
  • Conv2DTileConstraint (225-435)
  • computeInputCube (320-347)
Deeploy/MemoryLevelExtension/NetworkDeployers/MemoryLevelDeployer.py (1)
Deeploy/DeeployTypes.py (5)
  • _parseNode (2670-2677)
  • ONNXLayer (1895-2223)
  • NetworkContext (578-1092)
  • optimize (2260-2280)
  • optimize (2310-2329)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
  • NodeTemplate (87-229)
  • computeTransientBuffersSize (142-162)
  • NetworkContext (578-1092)
  • hoistTransientBuffers (165-185)
  • hoistTransientBuffer (893-912)
  • lookup (792-824)
DeeployTest/testMVP.py (3)
Deeploy/DeeployTypes.py (1)
  • NetworkDeployer (3258-3635)
DeeployTest/testUtils/platformMapping.py (1)
  • mapDeployer (92-261)
DeeployTest/testSchedulingExtension.py (1)
  • _mockScheduler (29-33)
🪛 LanguageTool
CHANGELOG.md

[style] ~31-~31: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 Ruff (0.14.1)
Deeploy/Targets/Neureka/TileConstraints/NeurekaPointwiseConstraint.py

233-233: Undefined name absoluteCube

(F821)

Deeploy/DeeployTypes.py

401-401: Redefinition of unused sizeInBytes from line 390

(F811)


447-447: Redefinition of unused sizeInBytes from line 444

(F811)


456-456: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


456-456: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py

263-263: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


263-263: Undefined name inputRequantCubes

(F821)


281-281: Local variable dilation is assigned to but never used

Remove assignment to unused variable dilation

(F841)


289-289: Local variable inputChannelVar is assigned to but never used

Remove assignment to unused variable inputChannelVar

(F841)


294-294: Local variable outputChannelVar is assigned to but never used

Remove assignment to unused variable outputChannelVar

(F841)


317-317: Local variable weightBuffer is assigned to but never used

Remove assignment to unused variable weightBuffer

(F841)


320-320: Local variable inputBatchVar is assigned to but never used

Remove assignment to unused variable inputBatchVar

(F841)


325-325: Local variable outputBatchVar is assigned to but never used

Remove assignment to unused variable outputBatchVar

(F841)


326-326: Local variable outputHeightVar is assigned to but never used

Remove assignment to unused variable outputHeightVar

(F841)


327-327: Local variable outputWidthVar is assigned to but never used

Remove assignment to unused variable outputWidthVar

(F841)


328-328: Local variable outputChannelVar is assigned to but never used

Remove assignment to unused variable outputChannelVar

(F841)


331-331: Local variable padding is assigned to but never used

Remove assignment to unused variable padding

(F841)


413-413: Unpacked variable BatchOffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


413-413: Unpacked variable HOffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


413-413: Unpacked variable WOffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


413-413: Unpacked variable COffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


414-414: Unpacked variable BatchSize is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


426-426: Unpacked variable inBSize is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


426-426: Unpacked variable inHSize is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


522-522: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py

122-122: Undefined name BSize

(F821)


133-133: Consider (BatchOffset, *AMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *AMatrixOffsets)

(RUF005)


133-133: Undefined name BatchOffset

(F821)


133-133: Consider (BatchSize, *AMatrixShape) instead of concatenation

Replace with (BatchSize, *AMatrixShape)

(RUF005)


136-139: Consider (BatchOffset, BOffset, *AMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *AMatrixOffsets)

(RUF005)


137-137: Undefined name BatchOffset

(F821)


138-138: Undefined name BOffset

(F821)


140-143: Consider (BatchSize, BSize, *AMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *AMatrixShape)

(RUF005)


142-142: Undefined name BSize

(F821)


149-149: Consider (BatchOffset, *BMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *BMatrixOffsets)

(RUF005)


149-149: Undefined name BatchOffset

(F821)


149-149: Consider (BatchSize, *BMatrixShape) instead of concatenation

Replace with (BatchSize, *BMatrixShape)

(RUF005)


152-155: Consider (BatchOffset, BOffset, *BMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *BMatrixOffsets)

(RUF005)


153-153: Undefined name BatchOffset

(F821)


154-154: Undefined name BOffset

(F821)


156-159: Consider (BatchSize, BSize, *BMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *BMatrixShape)

(RUF005)


158-158: Undefined name BSize

(F821)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

305-306: Expected an indented block after if statement

(invalid-syntax)


310-310: Expected an indented block after if statement

(invalid-syntax)

Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py

132-132: Undefined name BSize

(F821)


151-151: Consider (BatchOffset, *AMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *AMatrixOffsets)

(RUF005)


151-151: Undefined name BatchOffset

(F821)


151-151: Consider (BatchSize, *AMatrixShape) instead of concatenation

Replace with (BatchSize, *AMatrixShape)

(RUF005)


154-157: Consider (BatchOffset, BOffset, *AMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *AMatrixOffsets)

(RUF005)


155-155: Undefined name BatchOffset

(F821)


156-156: Undefined name BOffset

(F821)


158-161: Consider (BatchSize, BSize, *AMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *AMatrixShape)

(RUF005)


160-160: Undefined name BSize

(F821)


167-167: Consider (BatchOffset, *BMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *BMatrixOffsets)

(RUF005)


167-167: Undefined name BatchOffset

(F821)


167-167: Consider (BatchSize, *BMatrixShape) instead of concatenation

Replace with (BatchSize, *BMatrixShape)

(RUF005)


170-173: Consider (BatchOffset, BOffset, *BMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *BMatrixOffsets)

(RUF005)


171-171: Undefined name BatchOffset

(F821)


172-172: Undefined name BOffset

(F821)


174-177: Consider (BatchSize, BSize, *BMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *BMatrixShape)

(RUF005)


176-176: Undefined name BSize

(F821)


370-370: Consider (BatchOffset, *AMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *AMatrixOffsets)

(RUF005)


370-370: Undefined name BatchOffset

(F821)


370-370: Consider (BatchSize, *AMatrixShape) instead of concatenation

Replace with (BatchSize, *AMatrixShape)

(RUF005)


373-376: Consider (BatchOffset, BOffset, *AMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *AMatrixOffsets)

(RUF005)


374-374: Undefined name BatchOffset

(F821)


375-375: Undefined name BOffset

(F821)


377-380: Consider (BatchSize, BSize, *AMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *AMatrixShape)

(RUF005)


379-379: Undefined name BSize

(F821)


386-386: Consider (BatchOffset, *BMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *BMatrixOffsets)

(RUF005)


386-386: Undefined name BatchOffset

(F821)


386-386: Consider (BatchSize, *BMatrixShape) instead of concatenation

Replace with (BatchSize, *BMatrixShape)

(RUF005)


389-392: Consider (BatchOffset, BOffset, *BMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *BMatrixOffsets)

(RUF005)


390-390: Undefined name BatchOffset

(F821)


391-391: Undefined name BOffset

(F821)


393-396: Consider (BatchSize, BSize, *BMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *BMatrixShape)

(RUF005)


395-395: Undefined name BSize

(F821)


405-405: Consider (BatchOffset, *CMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *CMatrixOffsets)

(RUF005)


405-405: Undefined name BatchOffset

(F821)


405-405: Consider (BatchSize, *CMatrixShape) instead of concatenation

Replace with (BatchSize, *CMatrixShape)

(RUF005)


407-407: Consider (BatchOffset, BOffset, *CMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *CMatrixOffsets)

(RUF005)


407-407: Undefined name BatchOffset

(F821)


407-407: Undefined name BOffset

(F821)


407-407: Consider (BatchSize, BSize, *CMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *CMatrixShape)

(RUF005)


407-407: Undefined name BSize

(F821)

Deeploy/Targets/Neureka/TileConstraints/NeurekaDepthwiseConstraint.py

259-259: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


259-259: Undefined name inputRequantCubes

(F821)


277-277: Local variable dilation is assigned to but never used

Remove assignment to unused variable dilation

(F841)


314-314: Local variable weightBuffer is assigned to but never used

Remove assignment to unused variable weightBuffer

(F841)


317-317: Local variable inputBatchVar is assigned to but never used

Remove assignment to unused variable inputBatchVar

(F841)


320-320: Local variable inputChannelVar is assigned to but never used

Remove assignment to unused variable inputChannelVar

(F841)


322-322: Local variable outputBatchVar is assigned to but never used

Remove assignment to unused variable outputBatchVar

(F841)


323-323: Local variable outputHeightVar is assigned to but never used

Remove assignment to unused variable outputHeightVar

(F841)


324-324: Local variable outputWidthVar is assigned to but never used

Remove assignment to unused variable outputWidthVar

(F841)


325-325: Local variable outputChannelVar is assigned to but never used

Remove assignment to unused variable outputChannelVar

(F841)


328-328: Local variable padding is assigned to but never used

Remove assignment to unused variable padding

(F841)


408-408: Unpacked variable BatchOffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


408-408: Unpacked variable HOffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


408-408: Unpacked variable WOffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


408-408: Unpacked variable COffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


409-409: Unpacked variable BatchSize is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


421-421: Unpacked variable inBSize is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


421-421: Unpacked variable inHSize is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


517-517: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

Deeploy/MemoryLevelExtension/NetworkDeployers/MemoryLevelDeployer.py

131-131: Undefined name ONNXLayer

(F821)

Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py

25-25: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Deeploy/Targets/PULPOpen/Deployer.py

40-40: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

DeeployTest/testMVP.py

58-58: Unused function argument: verbose

(ARG001)

🔇 Additional comments (29)
.github/workflows/ci-platform-siracusa.yml (1)

57-75: Test additions well-integrated and properly organized.

The new kernel tests (lines 57–63 for Float2D convolutions with and without DW variants, line 75 for reshape with skip connection) and the model test (line 99 for TinyViT demo) follow established naming conventions, are logically grouped by operation type, and align with the PR's stated objectives to add FP test coverage.

Based on the established pattern (documented in learnings), these test names map to directories in DeeployTest/Tests/, and the prior review confirmed these directories exist and tests pass successfully.

Also applies to: 99-99

CHANGELOG.md (1)

100-103: Approve changelog entries for bug fixes.

The fixed section comprehensively documents the four key bug fixes (alias_of parameter handling, non-batched FP GEMM/matmul, closure naming, and FP add/mul data types) in alignment with the PR objectives. The descriptions are clear and concise.

Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)

35-35: LGTM!

The storage of n_cores is straightforward and correct.

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)

127-131: Nice performance improvement.

The refactored implementation using iterative construction is O(n) instead of O(n²) with repeated .index() calls, and the logic is clearer.


338-383: New DW convolution passes look well-structured.

The NCHWtoNHWCDwConvPass and PULPFPDWConvPass classes follow the established pattern-based pass architecture correctly. However, they depend on _PULPDWNCHWtoNHWC_fun, which has a syntax error that must be fixed first.

Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)

13-14: LGTM: MIN casts prevent signed/unsigned mixups

Casting ${size} to uint32_t here is correct and avoids UB on MIN comparisons.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)

100-117: Im2Col call argument order looks correct

X/Y, kernel, and stride ordering matches the expected implementation for these templates.

Based on learnings.


21-24: Conv2D Im2Col scratch sizing formula is correct

The formula matches the C kernel's pContextBuffer requirement. The buffer is correctly sized as (typeWidth // 8) * n_cores * ch_im_in * kx * ky bytes, which corresponds to the kernel logic where each core gets ch_im_in * kernel_x * kernel_y float elements, split across n_cores with pointer arithmetic advancing by element count. For fp32 (typeWidth=32), this yields the exact buffer needed: n_cores * ch_im_in * kx * ky * 4 bytes.

Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)

164-173: Review comment is incorrect: prior cleanup was not completed.

The first BCube construction block (lines 146-160) still exists as dead code. It is never appended to inputBCubes—only the second BCube constructed at line 172 is appended (line 173). The review comment incorrectly asserts that "earlier deleted branch already tried to build BCube," but this branch was never deleted. While lines 164-173 correctly represent the single active path, the orphaned BCube construction at lines 146-160 contradicts the cleanup claim.

Likely an incorrect or invalid review comment.

Deeploy/Targets/PULPOpen/Parsers.py (3)

80-82: FP Conv2D: bias arity check LGTM

Allowing 2 or 3 inputs at parse time is correct and aligns with bias support.


105-117: FP Conv2D: bias flags wiring is consistent

Setting has_bias ("true"/"false") and mapping bias when present matches existing templates.

Please confirm FloatConvTemplate.reference2DIm2ColTemplate expects:

  • keys: data_in, weight, data_out, has_bias, bias.

123-190: Remove suggestion to validate DW early in parseNode; current design is sound.

The review comment suggests early depthwise validation in parseNode, but this is architecturally infeasible. The parseNode method receives only the node (without context), so ch_im_in—needed to verify group == ch_im_in—is unavailable. Input channel dimensions are only accessible in parseNodeCtxt via ctxt.lookup(), which is where the DW check correctly occurs (line ~167).

The mention of "F_total % C == 1" references an undefined formula not present anywhere in the codebase.

The current two-stage design (attribute validation in parseNode, context validation in parseNodeCtxt) correctly reflects the available information at each stage and is the proper approach.

Likely an incorrect or invalid review comment.

Deeploy/Targets/PULPOpen/Bindings.py (5)

12-19: Import expansion LGTM

Float data types and new templates are correctly imported for the added FP bindings.


151-152: DMA Slice: template swap LGTM

Switch to DMASliceTemplate.referenceTemplate is consistent with DMA path.


163-172: Slice bindings for FP types: OK

SliceChecker + SliceTemplate with ForkTransformer over FloatDataTypes matches the Float path.


299-303: Float ReduceMean bindings: OK

Mixed float + integer axes signature bound to FloatReduceMeanTemplate with ClusterTransformer is consistent with parser behavior for FP opset.


239-245: FP DW Conv2D binding confirmed: template keys match parser

Verified that PULPFPDWConv2DParser.parseNodeCtxt() produces both has_bias and bias keys in operatorRepresentation (Parsers.py:175–176, 179), which are correctly expected by FloatConvTemplate.referenceDW2DIm2ColTemplate (FloatConvTemplate.py:81, 110, 142). The binding is correct.

Deeploy/TilingExtension/TilerExtension.py (1)

988-989: Review comment is incorrect — no bug exists

NetworkContext.copy() uses copy.copy(self), which is a shallow copy that preserves instance attributes like n_cores (they remain in the object's __dict__). _convertCtxtToStaticSchedule returns the same ctxt instance modified in-place, so n_cores naturally survives. Line 988's reassignment is defensive programming but not required for correctness.

TargetLibraries/PULPOpen/src/Convolution_fp32.c (1)

36-39: Check argument order consistency at call sites.

Kernels accept (H, W, P, Q, SP, SQ); FloatConvTemplate uses X,Y ordering per prior notes. Please confirm bindings pass the correct order to avoid transposed dims.

Based on learnings

Also applies to: 134-138

Deeploy/DeeployTypes.py (1)

1205-1210: Constructor argument mismatch in parseOutputs.

Use aliases=[] instead of alias_of=[] to match VariableBuffer.init.

Apply:

-                nb = ctxt.VariableBuffer(
-                    name = name,
-                    shape = node.shape,
-                    alias_of = [],
-                )
+                nb = ctxt.VariableBuffer(
+                    name = name,
+                    shape = node.shape,
+                    aliases = [],
+                )

Likely an incorrect or invalid review comment.

DeeployTest/testMVP.py (3)

84-92: LGTM!

The n_cores parameter is correctly propagated to mapDeployer, and the formatting is clean and consistent.


201-205: LGTM!

The CLI argument is well-defined with a sensible default and clear help text. The parameter correctly flows through to the deployment pipeline.


256-262: LGTM!

The n_cores parameter is correctly passed from CLI arguments to setupDeployer, completing the propagation chain through the deployment pipeline.

Deeploy/Targets/PULPOpen/Platform.py (6)

8-9: LGTM!

All new imports are properly structured and used within the file. The additions support FP depthwise convolution, slice operations, and the new optimization pass.

Also applies to: 32-32, 35-36


77-77: LGTM!

The FPDWConv2DMapper definition follows the established pattern and correctly pairs the FP depthwise conv parser with its bindings.


92-94: LGTM!

Both slice mappers correctly use SliceParser with different bindings to support DMA-based and software-based slice implementations. The mapper defined first in the mapping list (line 134) will have precedence during binding selection.


234-235: LGTM!

The RemoveEmptyConvBiasPass is correctly positioned near the end of the optimization pipeline, after merge and requantization passes. This placement allows the pass to clean up any empty bias parameters that may have been introduced or remain after earlier transformations.


134-134: No action needed; original review comment is incorrect.

The mapper order does not affect precedence. PULPDMASliceBindings operates on IntegerDataTypes while PULPSliceBindings operates on FloatDataTypes—these are mutually exclusive sets. The SliceChecker type constraints determine which binding applies, not the array order. The current ordering [SliceMapper, DMASliceMapper] is safe.

Likely an incorrect or invalid review comment.


108-108: Based on my verification of the parser constraint logic:

FPConv2DMapper matches regular convolutions exclusively:

  • PULPFPConv2DParser.parseNode() explicitly requires self.operatorRepresentation['group'] == 1
  • This constraint rejects any depthwise convolution (where group == ch_im_in)

FPDWConv2DMapper matches depthwise convolutions:

  • PULPFPDWConv2DParser.parseNode() does not have a group==1 constraint
  • Depthwise validation occurs in parseNodeCtxt() at line 186: if self.operatorRepresentation['group'] == self.operatorRepresentation['ch_im_in']

The mapper ordering [FPConv2DMapper, FPDWConv2DMapper] is correct: the more constrained parser (requiring group==1) is tried first, ensuring regular convolutions are matched before attempting depthwise matching. Any regular convolution caught by FPConv2DMapper will not reach FPDWConv2DMapper.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
Deeploy/DeeployTypes.py (2)

241-245: Replace mutable default (shape=[1]) with immutable and type-annotate.

Avoid shared mutable defaults.

-    def __init__(self, name: str = '', shape = [1], aliases: Optional[List[str]] = None):
+    def __init__(self, name: str = '', shape: Sequence[int] = (1,), aliases: Optional[List[str]] = None):

379-388: Fix visited set creation in has_live_ancestors (bug).

set(self.name) splits into characters; use a singleton set.

-        visited = set(self.name)
+        visited = {self.name}
♻️ Duplicate comments (6)
CHANGELOG.md (1)

23-23: Remove duplicate PR entry (#105).

The “Refactor tiling code generation [#105]” line is duplicated. Remove the second occurrence to avoid confusion.

- - Refactor tiling code generation [#105](https://github.com/pulp-platform/Deeploy/pull/105)
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)

25-26: Avoid mutable default for inputOffsets.

Use Optional and initialize inside to prevent shared state.

-                 inputOffsets: Dict[str, int] = {},
+                 inputOffsets: Optional[Dict[str, int]] = None,
                  n_cores: int = 8):
@@
-        if inputOffsets == {}:
-            for key in inputTypes.keys():
-                inputOffsets[key] = 0
+        if inputOffsets is None:
+            inputOffsets = {key: 0 for key in inputTypes.keys()}
Deeploy/Targets/PULPOpen/Deployer.py (1)

40-53: Avoid mutable default for inputOffsets; forward Optional to base.

Align with SignPropDeployer’s fix.

-from typing import Callable, Dict, List, Type
+from typing import Callable, Dict, List, Optional, Type
@@
-                 inputOffsets = {},
+                 inputOffsets: Optional[Dict[str, int]] = None,
                  n_cores: int = 8):
@@
-            inputOffsets = inputOffsets,
+            inputOffsets = inputOffsets,
             n_cores = n_cores,
Deeploy/DeeployTypes.py (3)

325-329: Constructor arg mismatch: use ‘aliases’, not ‘alias_of’.

This raises a TypeError at runtime.

-        return (cls(
-            name = node.name,
-            shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape,
-            alias_of = [],
-        ))
+        return cls(
+            name = node.name,
+            shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape,
+            aliases = [],
+        )

442-447: Avoid mutable defaults in ConstantBuffer.init.

Use immutable defaults and convert.

-    def __init__(self, name: str = '', shape = [1], values = [0]):
+    def __init__(self, name: str = '', shape: Sequence[int] = (1,), values: Union[Sequence, np.ndarray] = (0,)):
         # Pass a copy of alias_of to avoid shared references
         super().__init__(name, shape)
 
-        values = np.asarray(values)
+        values = np.asarray(values)

331-365: Align alias helpers with self.aliases (set) and remove alias_of usage.

Current code references a non-existent attribute and never updates aliases.

-    def add_aliases(self, aliases_to_add: List[str]):
+    def add_aliases(self, aliases_to_add: List[str]):
@@
-        if not hasattr(self, "alias_of"):
-            return None
-
-        for alias in aliases_to_add:
-            if alias not in self.alias_of:
-                self.alias_of.append(alias)
+        if not hasattr(self, "aliases") or self.aliases is None:
+            self.aliases = set()
+        for alias in aliases_to_add:
+            self.aliases.add(alias)
         return None
@@
-    def get_aliases_of(self):
+    def get_aliases_of(self) -> List[str]:
@@
-        if hasattr(self, "alias_of"):
-            return list(self.alias_of)
-        else:
-            return list()
+        return list(self.aliases) if hasattr(self, "aliases") and self.aliases is not None else []
🧹 Nitpick comments (14)
Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1)

158-159: Consider defensive handling for edge cases in name prefixing.

The underscore prefix addresses names starting with numbers as intended. However, the implementation doesn't guard against:

  • Empty or None name values
  • Names already starting with _ (which would produce __, a reserved identifier)

Consider adding a check:

-        # Prepend underscore to avoid name issues when beginning with problematic characters (like numbers)
-        self.closureName = "_" + name + self.closureSuffix
+        # Prepend underscore to avoid name issues when beginning with problematic characters (like numbers)
+        # Avoid creating double-underscore if name already starts with underscore
+        prefix = "" if name and name[0] == "_" else "_"
+        self.closureName = prefix + name + self.closureSuffix
Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1)

11-21: LGTM! Correct fix for non-batched input handling.

The conditional logic properly addresses the bug by only advancing input pointers when the corresponding input is actually batched. When A_batched or B_batched is false, the base pointer is reused across all batch iterations, which correctly implements broadcasting semantics.

The pointer arithmetic for the batched case is correct:

  • A offset: b * M * N for shape [batch, M, N]
  • B offset: b * N * O for shape [batch, N, O]

Minor: Line 17 has trailing whitespace after the colon that could be cleaned up.

-    % if B_batched:                  
+    % if B_batched:
Deeploy/Targets/Generic/Parsers.py (1)

1071-1074: Consider adding strict=True to zip() calls for defensive programming.

The zip() calls iterate over inputs and outputs with expected fixed-length lists. While parseNode already validates these lengths, adding strict=True (Python 3.10+) provides an explicit safety guarantee and catches bugs if the validation logic changes.

Apply this diff if Python 3.10+ is used:

-    for tensor, symName in zip(node.inputs, ['data_in', 'shape']):
+    for tensor, symName in zip(node.inputs, ['data_in', 'shape'], strict=True):
         self.operatorRepresentation[symName] = ctxt.lookup(tensor.name).name
-    for tensor, symName in zip(node.outputs, ['data_out']):
+    for tensor, symName in zip(node.outputs, ['data_out'], strict=True):
         self.operatorRepresentation[symName] = ctxt.lookup(tensor.name).name

Based on static analysis hints.

Deeploy/TilingExtension/TilerExtension.py (1)

988-989: Propagate n_cores earlier (and guard against missing attribute).

Set ctxt.n_cores when the context is created (or before tiling) to avoid surprises, and guard for attribute presence to prevent AttributeError if wrappers differ.

Example:

-        self.ctxt = self.tiler._convertCtxtToStaticSchedule(self.ctxt, memoryMap)
-        self.ctxt.n_cores = self.n_cores
+        self.ctxt = self.tiler._convertCtxtToStaticSchedule(self.ctxt, memoryMap)
+        if hasattr(self, "n_cores"):
+            self.ctxt.n_cores = self.n_cores
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)

35-35: Drop redundant self.n_cores assignment.

NetworkDeployer already stores n_cores; remove this duplicate to keep a single source of truth.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (4)

43-51: Fix return type annotation to match base (Union[int, IntVar]).

The function returns an int, but is annotated as str. Align with NodeTemplate’s signature.

-    def computeTransientBuffersSize(ctxt: NetworkContext,
-                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]:
+    def computeTransientBuffersSize(
+        ctxt: NetworkContext,
+        operatorRepresentation: OperatorRepresentation,
+    ) -> List[Tuple[str, Union[int, IntVar]]]:

46-51: Update stale comment about using size as string.

The returned size is numeric and also passed to hoistTransientBuffer; the “used as string” warning is misleading. Remove or clarify it.


59-65: Prefer annotateType over mutating referencedType.

Setting _type.referencedType mutates the class and can have side effects. Use annotateType to set a precise Pointer type and instance.

-        # Manually set the type of the im2col buffer to match the input type, since it defaults to void for transient buffers
-        ctxt.lookup(im2col_name)._type.referencedType = ctxt.lookup(
-            operatorRepresentation['data_in'])._type.referencedType
+        # Set the buffer pointer type to match data_in
+        ctxt.annotateType(im2col_name, ctxt.lookup(operatorRepresentation['data_in'])._type)

93-101: Nit: fix comment typos (“ChannelOout” → “ChannelOut”).

Purely cosmetic; improves readability.

Also applies to: 124-131

Deeploy/DeeployTypes.py (1)

573-576: Avoid mutable dict defaults in NetworkContext.init.

Initialize dicts inside to prevent shared state.

-                 globalObjects = {},
-                 localObjects = {},
+                 globalObjects: Optional[OrderedDict] = None,
+                 localObjects: Optional[OrderedDict] = None,
@@
-        self.globalObjects = OrderedDict()
-        self.localObjects = OrderedDict()
+        self.globalObjects = OrderedDict() if globalObjects is None else globalObjects
+        self.localObjects = OrderedDict() if localObjects is None else localObjects
DeeployTest/testUtils/platformMapping.py (2)

92-104: Use int for n_cores (not Optional) since default is numeric.

-    n_cores: Optional[int] = 8,
+    n_cores: int = 8,

114-188: Propagate n_cores to all SignProp-based deployers for consistency.

CMSIS, MemPool, SoftHier, Generic deployers inherit SignPropDeployer and accept n_cores. Pass it through so CLI behavior is uniform.

Example for each block:

-        deployer = CMSISDeployer(...,
-                                 deeployStateDir = deeployStateDir,
-                                 inputOffsets = inputOffsets)
+        deployer = CMSISDeployer(...,
+                                 deeployStateDir = deeployStateDir,
+                                 inputOffsets = inputOffsets,
+                                 n_cores = n_cores)

Apply similarly to MemPoolDeployer, SoftHierDeployer, GenericDeployer.

Also applies to: 140-188

TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)

50-148: Code duplication acceptable for performance.

The has_bias and no-bias paths are duplicated (lines 50-148 vs 149-248), which increases maintainability burden but avoids branching inside tight loops. This trade-off is reasonable for performance-critical embedded kernels.

Consider adding a comment noting this is intentional for performance if future maintainers question the duplication.

TargetLibraries/PULPOpen/src/Convolution_fp32.c (1)

41-101: Code duplication acceptable for performance.

The has_bias and no-bias code paths are duplicated to avoid conditional branching inside the innermost loops (lines 47-64). This is a common optimization in performance-critical embedded code where avoiding branches can significantly improve execution speed.

Similar duplication exists in the im2col path (lines 140-218). Consider adding a brief comment explaining this trade-off for future maintainers.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 85edf76 and 945b610.

📒 Files selected for processing (24)
  • .github/workflows/ci-platform-siracusa.yml (3 hunks)
  • CHANGELOG.md (4 hunks)
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1 hunks)
  • Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (2 hunks)
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1 hunks)
  • Deeploy/DeeployTypes.py (12 hunks)
  • Deeploy/Targets/Generic/Parsers.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Bindings.py (6 hunks)
  • Deeploy/Targets/PULPOpen/Deployer.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Parsers.py (2 hunks)
  • Deeploy/Targets/PULPOpen/Platform.py (7 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (5 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1 hunks)
  • Deeploy/TilingExtension/TilerExtension.py (1 hunks)
  • DeeployTest/testMVP.py (4 hunks)
  • DeeployTest/testRunner_tiled_siracusa.py (1 hunks)
  • DeeployTest/testUtils/platformMapping.py (2 hunks)
  • DeeployTest/testUtils/testRunner.py (3 hunks)
  • TargetLibraries/PULPOpen/inc/kernel/Conv.h (1 hunks)
  • TargetLibraries/PULPOpen/src/Convolution_fp32.c (4 hunks)
  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
  • Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py
  • DeeployTest/testUtils/testRunner.py
  • Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py
  • DeeployTest/testRunner_tiled_siracusa.py
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-24T12:53:00.168Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: TargetLibraries/PULPOpen/src/DWConvolution_fp32.c:0-0
Timestamp: 2025-09-24T12:53:00.168Z
Learning: In PULP platform code, log2 is defined in pulp_nn_utils and can be used safely without requiring math.h or causing floating-point build/runtime issues, unlike the standard library log2 function.

Applied to files:

  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
  • TargetLibraries/PULPOpen/inc/kernel/Conv.h
  • TargetLibraries/PULPOpen/src/Convolution_fp32.c
📚 Learning: 2025-09-24T11:43:47.236Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.

Applied to files:

  • .github/workflows/ci-platform-siracusa.yml
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (9)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (2)
  • inputs (2569-2586)
  • lookup (778-810)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
  • inputs (2569-2586)
  • lookup (778-810)
  • parseNode (1091-1106)
  • parseNodeCtxt (1109-1134)
  • NetworkContext (564-1078)
Deeploy/Targets/Generic/Parsers.py (31)
  • Conv2DParser (1220-1286)
  • parseNode (19-27)
  • parseNode (49-58)
  • parseNode (81-94)
  • parseNode (102-112)
  • parseNode (163-169)
  • parseNode (193-207)
  • parseNode (229-243)
  • parseNode (271-293)
  • parseNode (333-351)
  • parseNode (373-385)
  • parseNode (422-434)
  • parseNode (469-473)
  • parseNode (496-507)
  • parseNode (531-542)
  • parseNode (578-582)
  • parseNodeCtxt (29-41)
  • parseNodeCtxt (60-76)
  • parseNodeCtxt (114-155)
  • parseNodeCtxt (171-185)
  • parseNodeCtxt (209-221)
  • parseNodeCtxt (245-263)
  • parseNodeCtxt (295-325)
  • parseNodeCtxt (353-365)
  • parseNodeCtxt (387-414)
  • parseNodeCtxt (436-461)
  • parseNodeCtxt (475-488)
  • parseNodeCtxt (509-523)
  • parseNodeCtxt (544-570)
  • parseNodeCtxt (584-590)
  • parseNodeCtxt (603-620)
DeeployTest/testMVP.py (3)
Deeploy/DeeployTypes.py (1)
  • NetworkDeployer (3244-3629)
DeeployTest/testUtils/platformMapping.py (1)
  • mapDeployer (92-261)
DeeployTest/testSchedulingExtension.py (1)
  • _mockScheduler (29-33)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
  • NodeTemplate (87-229)
  • computeTransientBuffersSize (142-162)
  • NetworkContext (564-1078)
  • hoistTransientBuffers (165-185)
  • hoistTransientBuffer (879-898)
  • lookup (778-810)
Deeploy/Targets/PULPOpen/Bindings.py (4)
Deeploy/CommonExtensions/DataTypes.py (4)
  • float32_t (74-78)
  • int8_t (12-15)
  • int32_t (24-27)
  • uint8_t (36-39)
Deeploy/DeeployTypes.py (3)
  • CodeTransformation (2352-2386)
  • NodeBinding (1574-1719)
  • NodeTemplate (87-229)
Deeploy/Targets/Generic/TypeCheckers.py (2)
  • SliceChecker (37-51)
  • ReduceMeanChecker (310-324)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
DeeployTest/testUtils/platformMapping.py (2)
Deeploy/DeeployTypes.py (3)
  • DeploymentPlatform (2439-2482)
  • TopologyOptimizer (2237-2266)
  • NetworkDeployer (3244-3629)
Deeploy/Targets/PULPOpen/Deployer.py (1)
  • PULPDeployer (29-127)
Deeploy/DeeployTypes.py (2)
DeeployTest/testUtils/dmaUtils.py (1)
  • size (72-73)
Deeploy/AbstractDataTypes.py (1)
  • Pointer (303-388)
TargetLibraries/PULPOpen/inc/kernel/Conv.h (2)
TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
  • PULP_Conv2d_fp32_fp32_fp32_HWC (10-102)
  • PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC (104-219)
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
  • PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC (10-251)
Deeploy/Targets/PULPOpen/Platform.py (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
  • RemoveEmptyConvBiasPass (519-524)
Deeploy/Targets/PULPOpen/Parsers.py (1)
  • PULPFPDWConv2DParser (124-190)
Deeploy/DeeployTypes.py (1)
  • NodeMapper (1722-1878)
Deeploy/Targets/Generic/Parsers.py (1)
  • SliceParser (97-155)
🪛 LanguageTool
CHANGELOG.md

[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 Ruff (0.14.1)
Deeploy/Targets/Generic/Parsers.py

1071-1071: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


1073-1073: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

DeeployTest/testMVP.py

58-58: Unused function argument: verbose

(ARG001)

Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py

25-25: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Deeploy/Targets/PULPOpen/Deployer.py

40-40: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🔇 Additional comments (18)
.github/workflows/ci-platform-siracusa.yml (1)

57-64: CI test coverage additions align well with PR features.

The new tests for FP 2D convolution (standard and depthwise variants with/without bias), reshape with skip connection, and TinyViT demo align cleanly with the PR's feature additions. Test naming is consistent with existing patterns, and per the earlier review resolution, these tests exist as directories in DeeployTest/Tests/. The additions integrate smoothly into the existing kernel and model test lists without logic changes or structural issues.

Also applies to: 75-75, 99-99

Deeploy/Targets/Generic/Parsers.py (1)

1070-1076: LGTM! Clear and Pythonic mapping implementation.

The explicit mapping of inputs to ['data_in', 'shape'] and outputs to ['data_out'] improves clarity and follows the pattern established by other parsers in the codebase (e.g., iRMSNormParser lines 65-71). The use of zip() is more Pythonic than index-based access.

Deeploy/DeeployTypes.py (2)

879-899: Ensure hoistTransientBuffer size is always int.

If any templates pass symbolic strings, they’ll break here. Confirm all computeTransientBuffersSize implementations return int/IntVar as per signature.


2696-2702: Good: preserve n_cores on context reinit.

Reconstructing NetworkContext with n_cores=self.ctxt.n_cores avoids losing state on parse.

Deeploy/Targets/PULPOpen/Parsers.py (3)

80-81: LGTM! Bias support correctly integrated.

The relaxed input count check now allows optional bias (2 inputs without bias, 3 inputs with bias), which aligns with the bias handling added in parseNodeCtxt.


106-118: LGTM! Clean bias handling implementation.

The conditional bias handling correctly maps inputs based on the node's input count, setting appropriate flags for downstream code generation.


124-191: LGTM! Well-structured depthwise convolution parser.

The parser correctly validates depthwise convolution constraints:

  • Square padding requirement
  • Optional bias support (2 or 3 inputs)
  • Depthwise verification via group == ch_im_in check (line 187)

The structure mirrors the regular FP Conv2D parser while adding depthwise-specific validation.

DeeployTest/testMVP.py (1)

57-59: LGTM! Clean n_cores parameter propagation.

The n_cores parameter is consistently threaded through the deployment setup:

  1. Added to setupDeployer signature
  2. Passed to mapDeployer
  3. Exposed via CLI argument (default: 8)
  4. Propagated from parsed arguments

This enables proper sizing of per-core buffers (e.g., im2col buffers) in multi-core PULP deployments.

Also applies to: 84-92, 201-205, 261-262

TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (2)

10-36: LGTM! Proper per-core work distribution.

The function correctly partitions output channels across cores:

  • Computes per-core channel chunks with remainder handling
  • Early-exits when no channels assigned to a core
  • Adjusts weight and im2col buffer pointers per core

This enables efficient parallel execution on multi-core PULP clusters.


103-145: LGTM! Correct depthwise convolution logic.

The input channel iteration (lines 103-104) correctly computes the range based on output channels assigned to the core, accounting for the depthwise constraint where each input channel produces F_total/C output channels (the "group" parameter in ONNX Conv).

The output channel range computation (lines 118-130) properly clamps to the core's assigned range, ensuring correct work distribution.

Deeploy/Targets/PULPOpen/Bindings.py (3)

151-151: LGTM! Clear separation of DMA and non-DMA slice operations.

The update clarifies the distinction:

  • PULPDMASliceBindings (line 151): Uses DMASliceTemplate for asynchronous DMA-based slicing returning a Future
  • PULPSliceBindings (lines 163-172): Uses SliceTemplate for synchronous float slicing with ForkTransformer

This separation enables appropriate code transformation for each use case.

Also applies to: 163-172


239-245: LGTM! Depthwise convolution bindings correctly configured.

The new PULPFloatDWConv2DBindings:

  • Support all float types via FloatDataTypes iteration
  • Use FloatConvTemplate.referenceDW2DIm2ColTemplate (the new DW im2col template)
  • Apply ForkTransformer for multi-core execution

This aligns with the new PULPFPDWConv2DParser and enables FP depthwise convolution support.


298-302: LGTM! Extended ReduceMean support for mixed float/integer inputs.

The addition supports ReduceMean operations with float data and integer axes/keepdims parameters, using FloatReduceMeanTemplate. This is a natural extension of the existing integer-only ReduceMean bindings.

Deeploy/Targets/PULPOpen/Platform.py (2)

8-9: LGTM! Proper wiring of new depthwise and slice capabilities.

The platform changes correctly integrate the new functionality:

  1. FPDWConv2DMapper (line 77) combines the new parser and bindings for FP depthwise convolution
  2. Conv layer (line 108) now supports both FPConv2DMapper and FPDWConv2DMapper
  3. Slice operation (line 134) supports both regular and DMA-based mappers
  4. All necessary imports added for new parsers, bindings, and optimization passes

This completes the integration stack for FP depthwise convolution on PULP.

Also applies to: 32-32, 35-36, 77-77, 92-94, 108-108, 134-134


234-235: LGTM! RemoveEmptyConvBiasPass addition is appropriate.

Adding RemoveEmptyConvBiasPass to the PULPOptimizer makes sense when introducing bias support. This pass likely removes unnecessary bias parameters when the bias tensor is empty or zero, optimizing the generated code.

The placement after PULPAddRequantMergePass is reasonable in the optimization pass sequence.

TargetLibraries/PULPOpen/inc/kernel/Conv.h (1)

12-27: LGTM! Header declarations consistent with implementations.

The function signatures correctly extend the existing FP32 convolution API:

  1. PULP_Conv2d_fp32_fp32_fp32_HWC (lines 12-18): Added pSrcBias and has_bias parameters
  2. PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC (lines 20-27): Added bias parameters and pContextBuffer
  3. PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC (lines 29-36): New entry point for depthwise im2col convolution

All declarations match the implementations in Convolution_fp32.c and DWConvolution_fp32.c. Based on learnings.

Also applies to: 29-36

TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)

10-34: LGTM! Proper multi-core parallelization added.

The updated function now includes:

  1. Bias parameter support (pSrcBias, has_bias)
  2. Per-core output channel distribution (lines 18-31)
  3. Core-specific weight pointer adjustment (line 34)
  4. Early exit for cores with no assigned channels (lines 29-31)

This enables efficient parallel execution while maintaining correctness. Based on learnings.


66-67: LGTM! Correct per-core output indexing and bias application.

The output indexing (lines 66-67, 96-97, 173, 212-214) correctly accounts for:

  1. Global output index calculation: (h_out * W_out + w_out) * F_total + ...
  2. Core-specific channel offset: ch_out_start + f or just f (depending on loop variable)
  3. Bias application using the global channel index (line 67: pSrcBias[f + ch_out_start], line 175: pSrcBias[f])

This ensures correct results when work is distributed across multiple cores.

Also applies to: 175-175

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
Deeploy/DeeployTypes.py (2)

241-241: Complete the mutable default fix for shape parameter.

The aliases parameter correctly uses None, but shape = [1] is still a mutable default. This was flagged in past reviews and by static analysis.

Apply this diff:

-    def __init__(self, name: str = '', shape = [1], aliases: Optional[List[str]] = None):
+    def __init__(self, name: str = '', shape: Optional[Sequence[int]] = None, aliases: Optional[List[str]] = None):
         self.name: str = name
-        self.shape: Sequence[int] = shape
+        self.shape: Sequence[int] = shape if shape is not None else (1,)

As per past review feedback and static analysis (Ruff B006).


437-439: Fix mutable default arguments in ConstantBuffer.__init__.

Both shape = [1] and values = [0] are mutable defaults, flagged by past reviews and static analysis (Ruff B006).

Apply this diff:

-    def __init__(self, name: str = '', shape = [1], values = [0]):
-        super().__init__(name, shape)
+    def __init__(self, name: str = '', shape: Optional[Sequence[int]] = None, values: Union[Sequence, np.ndarray, None] = None):
+        super().__init__(name, shape if shape is not None else (1,))
-        values = np.asarray(values)
+        values = np.asarray(values if values is not None else [0])

As per past review feedback and static analysis.

♻️ Duplicate comments (2)
CHANGELOG.md (1)

23-23: Remove duplicate PR reference.

Line 23 duplicates the PR #105 entry already present at line 18. Remove the redundant entry to maintain changelog clarity.

- Use Docker digests instead of arch-specific tags [#106](https://github.com/pulp-platform/Deeploy/pull/106)
- Refactor tiling code generation [#105](https://github.com/pulp-platform/Deeploy/pull/105)
- Fix `Unsqueeze` Op. when using ONNX opset 13 or higher (from attribute to input) [#119](https://github.com/pulp-platform/Deeploy/pull/119)
Deeploy/DeeployTypes.py (1)

331-350: Critical bug: cannot append to a set.

Line 260 initializes self.aliases as a set, but line 348 calls .append() which doesn't exist on sets. This will raise an AttributeError at runtime.

Apply this diff:

     def add_aliases(self, aliases_to_add: List[str]):
         """
         Adds list of aliases to the aliases attribute.
         Parameters
         ----------
         alias_to_add : List[str]
             List of names of aliases to add to the aliases attribute.
         Returns
         -------
         None
         """
 
         if not hasattr(self, "aliases"):
             return None
 
         for alias in aliases_to_add:
-            if alias not in self.aliases:
-                self.aliases.append(alias)
+            self.aliases.add(alias)
 
         return None
🧹 Nitpick comments (1)
CHANGELOG.md (1)

30-32: Reduce successive FP prefix repetition.

Three consecutive entries (lines 30–32) begin with "FP", which reduces readability. Consider rewording one or more entries to vary the opening phrase.

Example refactors:

  • Line 30: "Conv 2D (simple & DW), reshape & skip connection, and TinyViT demo tests..."
  • Line 31: "Bindings and mappings for PULP slice, DW conv 2D, and reduce mean operations."
  • Line 32: "DW conv lowering optimization pass (FP variant) similar to the existent one for integer version."
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba1e15a and ffe7a27.

📒 Files selected for processing (2)
  • CHANGELOG.md (4 hunks)
  • Deeploy/DeeployTypes.py (9 hunks)
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md

[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (114)
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
🔇 Additional comments (4)
Deeploy/DeeployTypes.py (4)

558-567: LGTM: n_cores parameter added correctly.

The n_cores parameter is properly added to NetworkContext.__init__ with a default value of 8 and stored as an instance variable. No duplicate assignment.


2479-2519: LGTM: n_cores propagated correctly to NetworkContext.

The n_cores parameter is properly added to NetworkContainer.__init__ and correctly passed through to the NetworkContext constructor at lines 2518-2519.


2679-2685: LGTM: NetworkContext recreation preserves n_cores.

The new NetworkContext is correctly created with n_cores = self.ctxt.n_cores, preserving the core count from the existing context.


3231-3285: LGTM: n_cores propagated through NetworkDeployer.

The n_cores parameter is properly added to NetworkDeployer.__init__, passed to the parent NetworkContainer via super().__init__(), and stored as an instance variable at line 3285.

@diaconuccalin diaconuccalin requested a review from lukamac October 27, 2025 18:58
Copy link
Collaborator

@lukamac lukamac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I reviewed the whole PR. There are still some changes I would like to see specifically around the n_cores and some remnant changes around reshape aliasing. I left some comments also in the older threads so check those too. Thanks!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (1)

42-48: Consider removing redundant buffer retrieval.

The parent class _GenericReshapeTemplate.alignToContext() already retrieves and validates these buffers (lines 29-33 in Generic template). Since you call super().alignToContext() on line 40, you can retrieve the buffers directly from the returned ctxt without re-fetching them.

Apply this diff to streamline the code:

-        # Get buffers
-        bufferIn = ctxt.lookup(operatorRepresentation['data_in'])
-        assert isinstance(bufferIn, VariableBuffer)
-
-        bufferOut = ctxt.lookup(operatorRepresentation['data_out'])
-        assert isinstance(bufferOut, VariableBuffer)
-
+        # Get buffers (already retrieved and validated by parent)
+        bufferIn = ctxt.lookup(operatorRepresentation['data_in'])
+        bufferOut = ctxt.lookup(operatorRepresentation['data_out'])
+

Or even more concisely, inline them:

-        # Get buffers
-        bufferIn = ctxt.lookup(operatorRepresentation['data_in'])
-        assert isinstance(bufferIn, VariableBuffer)
-
-        bufferOut = ctxt.lookup(operatorRepresentation['data_out'])
-        assert isinstance(bufferOut, VariableBuffer)
-
         # Linking required for tiling
-        bufferOut._alias = bufferIn.name
+        ctxt.lookup(operatorRepresentation['data_out'])._alias = ctxt.lookup(operatorRepresentation['data_in']).name
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ea12db and 71b7e79.

📒 Files selected for processing (2)
  • Deeploy/Targets/Generic/Templates/ReshapeTemplate.py (3 hunks)
  • Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
🧬 Code graph analysis (2)
Deeploy/Targets/Generic/Templates/ReshapeTemplate.py (1)
Deeploy/DeeployTypes.py (3)
  • NodeTemplate (87-229)
  • lookup (722-754)
  • VariableBuffer (232-360)
Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (2)
Deeploy/DeeployTypes.py (4)
  • NetworkContext (508-1022)
  • VariableBuffer (232-360)
  • alignToContext (119-139)
  • lookup (722-754)
Deeploy/Targets/Generic/Templates/ReshapeTemplate.py (2)
  • _GenericReshapeTemplate (10-39)
  • alignToContext (15-39)
🔇 Additional comments (3)
Deeploy/Targets/Generic/Templates/ReshapeTemplate.py (1)

10-10: LGTM! Clean refactoring to clarify template hierarchy.

The rename from _ReshapeTemplate to _GenericReshapeTemplate improves clarity and aligns well with the inheritance pattern where the PULP-specific template now extends this base class.

Also applies to: 42-42

Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (2)

28-32: LGTM! Inheritance approach correctly implemented.

The addition of the VariableBuffer import and inheritance from _GenericReshapeTemplate properly implements the agreed-upon design from the past review discussion.


49-50: LGTM! Tiler-specific aliasing correctly preserved.

The _alias attribute assignment is correctly preserved for tiler compatibility, as discussed in the past review comments. This works alongside the parent's bidirectional aliases set mechanism.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71b7e79 and e9e6395.

📒 Files selected for processing (8)
  • Deeploy/DeeployTypes.py (3 hunks)
  • Deeploy/Targets/PULPOpen/Deployer.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Parsers.py (2 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (5 hunks)
  • DeeployTest/generateNetwork.py (2 hunks)
  • DeeployTest/testMVP.py (5 hunks)
  • DeeployTest/testRunner_siracusa.py (1 hunks)
  • DeeployTest/testUtils/testRunner.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • DeeployTest/testUtils/testRunner.py
  • Deeploy/Targets/PULPOpen/Parsers.py
  • Deeploy/Targets/PULPOpen/Deployer.py
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-09T15:58:06.454Z
Learnt from: Xeratec
PR: pulp-platform/Deeploy#105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The _legalizeTransfers function in TilingCodeGeneration.py converts the innermost dimension of transfer rectangles from elements to bytes using a sizeInBytes helper function, ensuring DMA implementations receive properly byte-converted transfer parameters.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (3)
DeeployTest/testRunner_siracusa.py (1)
DeeployTest/testUtils/testRunner.py (1)
  • TestRunner (279-448)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
  • NodeTemplate (87-229)
  • computeTransientBuffersSize (142-162)
  • NetworkContext (508-1020)
  • hoistTransientBuffers (165-185)
  • hoistTransientBuffer (821-840)
  • lookup (720-752)
DeeployTest/testMVP.py (1)
Deeploy/Targets/PULPOpen/Platform.py (1)
  • PULPPlatform (251-259)
🪛 Ruff (0.14.2)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py

44-44: Unused static method argument: ctxt

(ARG004)

DeeployTest/testMVP.py

59-59: Unused function argument: verbose

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (84)
  • GitHub Check: deeploy-typing
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: deeploy-typing
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: deeploy-typing
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: deeploy-typing
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: deeploy-typing
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: deeploy-typing
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: deeploy-typing
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: deeploy-typing
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: deeploy-typing
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: deeploy-typing
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
🔇 Additional comments (4)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (4)

21-23: LGTM: Buffer sizing now accounts for runtime core count.

The dynamic n_cores factor correctly sizes the Im2Col buffer for parallel execution across cores.


47-50: Correct: DW conv buffer omits ch_im_in factor.

Unlike regular convolution, depthwise convolution processes each channel independently, so the Im2Col buffer is correctly sized per-core with kernel dimensions only, without the ch_im_in factor.


61-62: Clarify: Why set buffer type if allocation size is already in bytes?

The code explicitly sets the buffer's referencedType, but according to earlier comments, "Buffer variables don't support type assignment and require specifying the total size." Since im2col_dim already includes byte-size calculation (line 48-50: typeWidth // 8), what purpose does this type assignment serve?

If it's for C codegen or downstream type checking, consider adding a comment explaining the intent.


82-82: LGTM: Templates correctly updated with bias support and proper argument ordering.

The additions are consistent:

  • Bias parameters (${bias}, ${has_bias}) added to all convolution calls
  • Im2Col templates correctly pass ${ctxtBuffer} for scratch space
  • Argument ordering (X,Y for dimensions/kernels/strides) matches the kernel prototypes

Based on learnings.

Also applies to: 101-123, 125-155

@diaconuccalin diaconuccalin requested a review from lukamac October 30, 2025 15:02
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
CHANGELOG.md (1)

27-33: Consider rewording consecutive "FP" entries for stylistic improvement.

Lines 30–32 in the Added section begin with "FP," which creates successive repetition. While this is minor in a technical changelog, consider rewording one or two entries for better readability without losing clarity.

Example rewording (optional):

  • Line 30: Keep as-is or rephrase to emphasize the TinyViT contribution (e.g., "Conv 2D (simple & DW) tests, reshape & skip connection tests, and TinyViT demo tests for...")
  • Line 31: Rephrase to reduce "FP" lead (e.g., "Slice, DW conv 2D, and reduce mean bindings and mappings in floating-point...")
  • Line 32: Keep as-is since it emphasizes the parallel with the integer version
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fdfd09c and 5e30cdf.

📒 Files selected for processing (1)
  • CHANGELOG.md (4 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • CHANGELOG.md
🪛 LanguageTool
CHANGELOG.md

[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🔇 Additional comments (1)
CHANGELOG.md (1)

7-7: PR entry added correctly.

The addition of PR #117 to the List of Pull Requests is properly formatted and positioned at the top, matching the project's convention.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
Deeploy/Targets/PULPOpen/Parsers.py (1)

124-153: Depthwise FP parser: enforce multiplier == 1

Currently DW detection checks group == ch_im_in, but doesn’t validate ch_im_out == ch_im_in. If multiplier ≠ 1 is provided, the FP DW kernel will likely miscompute.

Add a fast-fail in parseNodeCtxt after shapes are known.

             # Check if DW
-            if self.operatorRepresentation['group'] == self.operatorRepresentation['ch_im_in']:
-                return newCtxt, True
+            if self.operatorRepresentation['group'] == self.operatorRepresentation['ch_im_in']:
+                if self.operatorRepresentation['ch_im_out'] != self.operatorRepresentation['ch_im_in']:
+                    return ctxt, False  # depthwise multiplier != 1 not supported
+                return newCtxt, True

Also applies to: 155-182

🧹 Nitpick comments (6)
DeeployTest/generateNetwork.py (1)

87-90: Validate --cores as power-of-two to match kernel assumptions

Many kernels compute chunks with (NUM_CORES-1) masks and log2(NUM_CORES). Non–power-of-two values will break these assumptions.

  • Enforce cores > 0 and power-of-two at CLI parse time.
  • Clarify help text accordingly.
 parser.add_argument(
     "--cores",
     type = int,
-    default = 1,
-    help =
-    "Number of cores on which the network is run. Currently, required for im2col buffer sizing on Siracusa. Default: 1.",
+    default = 1,
+    help = "Number of cluster cores (power-of-two). Used for im2col buffer sizing on Siracusa. Default: 1.",
 )

Add validation after parsing:

+    if args.cores <= 0 or (args.cores & (args.cores - 1)) != 0:
+        raise ValueError(f"--cores must be a positive power-of-two, got {args.cores}")

Also applies to: 191-196

Deeploy/Targets/PULPOpen/Deployer.py (1)

67-74: Remove unused variable to fix F841 and reduce noise

autoTransposePass is created but never used.

-    autoTransposePass = AutoTransposeMergePass()
-    #self.ctxt, self.layerBinding = autoTransposePass.apply(self.ctxt, self.graph, self.layerBinding)
+    # autoTransposePass intentionally kept disabled; delete if not needed

Alternatively, delete both lines if the pass is not planned.

Also applies to: 81-83

Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1)

9-16: Prefer LOG2(NUM_CORES) over log2(NUM_CORES) for integer core counts

Using the macro avoids double math and header requirements, and matches other kernels.

-uint8_t ${nodeName}_log2Core = (uint8_t) log2(NUM_CORES);
+uint8_t ${nodeName}_log2Core = (uint8_t) LOG2(NUM_CORES);
CHANGELOG.md (1)

33-34: Micro copy edit

“existent” → “existing”.

- FP PULP DW conv lowering optimization pass similar to the existent one for integer version.
+ FP PULP DW conv lowering optimization pass similar to the existing one for the integer version.
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (2)

137-141: Redundant modulo in weight indexing.

im2col_idx already iterates [0, P*Q) (line 137), so im2col_idx % (P * Q) at line 140 is redundant and always equals im2col_idx.

Apply this diff to simplify:

             sum +=
                 im2col_buffer[im2col_idx] *
-                weight_ptr[(f - ch_out_start) * P * Q + im2col_idx % (P * Q)];
+                weight_ptr[(f - ch_out_start) * P * Q + im2col_idx];

Also applies to line 239 in the no-bias path.


50-248: Significant code duplication between bias and no-bias paths.

Lines 50–148 and 149–248 duplicate ~99 lines of padding and convolution logic; only line 144 vs 243 differs (bias addition). While this pattern is common and tests pass, consider extracting the common im2col/convolution logic into a helper or macro to reduce maintenance burden.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd40418 and fe40743.

📒 Files selected for processing (23)
  • .github/workflows/ci-platform-siracusa.yml (3 hunks)
  • CHANGELOG.md (6 hunks)
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1 hunks)
  • Deeploy/CommonExtensions/DataTypes.py (1 hunks)
  • Deeploy/CommonExtensions/NetworkDeployers/NetworkDeployerWrapper.py (2 hunks)
  • Deeploy/DeeployTypes.py (3 hunks)
  • Deeploy/EngineExtension/NetworkDeployers/EngineColoringDeployer.py (2 hunks)
  • Deeploy/Targets/PULPOpen/Bindings.py (5 hunks)
  • Deeploy/Targets/PULPOpen/Deployer.py (2 hunks)
  • Deeploy/Targets/PULPOpen/Parsers.py (2 hunks)
  • Deeploy/Targets/PULPOpen/Platform.py (8 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (5 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (1 hunks)
  • DeeployTest/generateNetwork.py (3 hunks)
  • DeeployTest/testMVP.py (3 hunks)
  • DeeployTest/testUtils/testRunner.py (1 hunks)
  • TargetLibraries/PULPOpen/inc/kernel/Conv.h (1 hunks)
  • TargetLibraries/PULPOpen/src/Convolution_fp32.c (4 hunks)
  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • Deeploy/EngineExtension/NetworkDeployers/EngineColoringDeployer.py
  • Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
  • DeeployTest/testUtils/testRunner.py
  • DeeployTest/testMVP.py
  • Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
📚 Learning: 2025-09-24T11:43:47.236Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.

Applied to files:

  • .github/workflows/ci-platform-siracusa.yml
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
  • TargetLibraries/PULPOpen/inc/kernel/Conv.h
  • Deeploy/Targets/PULPOpen/Parsers.py
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
  • TargetLibraries/PULPOpen/src/Convolution_fp32.c
  • CHANGELOG.md
📚 Learning: 2025-09-24T12:53:00.168Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: TargetLibraries/PULPOpen/src/DWConvolution_fp32.c:0-0
Timestamp: 2025-09-24T12:53:00.168Z
Learning: In PULP platform code, log2 is defined in pulp_nn_utils and can be used safely without requiring math.h or causing floating-point build/runtime issues, unlike the standard library log2 function.

Applied to files:

  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-09T15:58:06.454Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The _legalizeTransfers function in TilingCodeGeneration.py converts the innermost dimension of transfer rectangles from elements to bytes using a sizeInBytes helper function, ensuring DMA implementations receive properly byte-converted transfer parameters.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (10)
TargetLibraries/PULPOpen/inc/kernel/Conv.h (2)
TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
  • PULP_Conv2d_fp32_fp32_fp32_HWC (10-102)
  • PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC (104-219)
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
  • PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC (10-251)
Deeploy/Targets/PULPOpen/Deployer.py (3)
Deeploy/Targets/PULPOpen/Platform.py (1)
  • PULPClusterEngine (245-254)
Deeploy/DeeployTypes.py (6)
  • _selectEngine (2565-2569)
  • bind (1600-1625)
  • bind (1767-1791)
  • bind (2064-2108)
  • bind (2729-2765)
  • hoistGlobalDefinition (842-855)
Deeploy/TilingExtension/TilerExtension.py (1)
  • bind (1002-1008)
Deeploy/DeeployTypes.py (2)
Deeploy/CommonExtensions/NetworkDeployers/NetworkDeployerWrapper.py (1)
  • _selectEngine (69-70)
Deeploy/EngineExtension/NetworkDeployers/EngineColoringDeployer.py (1)
  • _selectEngine (51-58)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
  • inputs (2503-2520)
  • lookup (720-752)
  • parseNode (1033-1048)
  • parseNodeCtxt (1051-1076)
  • NetworkContext (508-1020)
Deeploy/Targets/Generic/Parsers.py (31)
  • Conv2DParser (1218-1284)
  • parseNode (19-27)
  • parseNode (49-58)
  • parseNode (81-94)
  • parseNode (102-112)
  • parseNode (163-169)
  • parseNode (193-207)
  • parseNode (229-243)
  • parseNode (271-293)
  • parseNode (333-351)
  • parseNode (373-385)
  • parseNode (422-434)
  • parseNode (469-473)
  • parseNode (496-507)
  • parseNode (531-542)
  • parseNode (578-582)
  • parseNodeCtxt (29-41)
  • parseNodeCtxt (60-76)
  • parseNodeCtxt (114-155)
  • parseNodeCtxt (171-185)
  • parseNodeCtxt (209-221)
  • parseNodeCtxt (245-263)
  • parseNodeCtxt (295-325)
  • parseNodeCtxt (353-365)
  • parseNodeCtxt (387-414)
  • parseNodeCtxt (436-461)
  • parseNodeCtxt (475-488)
  • parseNodeCtxt (509-523)
  • parseNodeCtxt (544-570)
  • parseNodeCtxt (584-590)
  • parseNodeCtxt (603-620)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
  • NodeTemplate (87-229)
  • computeTransientBuffersSize (142-162)
  • NetworkContext (508-1020)
  • hoistTransientBuffers (165-185)
  • hoistTransientBuffer (821-840)
  • lookup (720-752)
Deeploy/CommonExtensions/DataTypes.py (1)
Deeploy/AbstractDataTypes.py (1)
  • typeWidth (399-400)
Deeploy/Targets/PULPOpen/Bindings.py (4)
Deeploy/CommonExtensions/DataTypes.py (5)
  • float32_t (74-78)
  • int8_t (12-15)
  • int32_t (24-27)
  • int64_t (30-33)
  • uint8_t (36-39)
Deeploy/DeeployTypes.py (3)
  • CodeTransformation (2290-2324)
  • NodeBinding (1512-1657)
  • NodeTemplate (87-229)
Deeploy/Targets/Generic/TypeCheckers.py (3)
  • SliceChecker (37-51)
  • ReshapeChecker (122-136)
  • ReduceMeanChecker (310-324)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
DeeployTest/generateNetwork.py (1)
Deeploy/Targets/PULPOpen/Platform.py (2)
  • PULPClusterEngine (245-254)
  • PULPPlatform (257-265)
Deeploy/CommonExtensions/NetworkDeployers/NetworkDeployerWrapper.py (2)
Deeploy/DeeployTypes.py (3)
  • DeploymentEngine (2327-2374)
  • NetworkDeployer (3176-3549)
  • _selectEngine (2565-2569)
Deeploy/EngineExtension/NetworkDeployers/EngineColoringDeployer.py (1)
  • _selectEngine (51-58)
Deeploy/Targets/PULPOpen/Platform.py (3)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
  • RemoveEmptyConvBiasPass (518-523)
Deeploy/Targets/PULPOpen/Parsers.py (1)
  • PULPFPDWConv2DParser (124-182)
Deeploy/Targets/Generic/Parsers.py (1)
  • SliceParser (97-155)
🪛 LanguageTool
CHANGELOG.md

[style] ~33-~33: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 Ruff (0.14.3)
Deeploy/Targets/PULPOpen/Deployer.py

81-81: Local variable autoTransposePass is assigned to but never used

Remove assignment to unused variable autoTransposePass

(F841)

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py

48-48: Unused static method argument: ctxt

(ARG004)

🔇 Additional comments (24)
.github/workflows/ci-platform-siracusa.yml (2)

38-99: Approved: Test additions follow existing conventions and align with PR objectives.

The new tests are well-organized, follow the camelCase naming convention consistent with existing tests, and logically group related functionality (standard conv → DW conv → special ops). The additions map to the newly implemented FP depthwise convolution, bias support, reshape/skip-connection, and TinyViT features mentioned in the PR summary.


57-63: All new test directories are present and properly configured.

Verification confirms all eight newly added tests have corresponding directories under DeeployTest/Tests/. The additions to the CI pipeline are valid and align with the stated PR objectives.

Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)

10-14: Consistent unsigned core math

Thanks for harmonizing the core ID/start/end calculations to uint32_t with explicit casts. This keeps the MIN/shift arithmetic in the unsigned domain and lines up with the FloatAdd template changes, so we avoid sign-mismatch warnings and latent UB. Looks good.

Deeploy/Targets/PULPOpen/Deployer.py (1)

84-92: n_cores propagation looks good

Injecting n_cores before binding ensures operator representations can size buffers accordingly. LGTM.

Deeploy/DeeployTypes.py (2)

2565-2569: Engine selection helper is clear and minimal

Returning DeploymentEngine and deferring to Mapping is a good split. LGTM.


2585-2587: Binding via selected engine Mapping is correct

This aligns with the new _selectEngine contract. LGTM.

Deeploy/CommonExtensions/NetworkDeployers/NetworkDeployerWrapper.py (1)

7-7: Wrapper delegate for _selectEngine is consistent

Imports and delegation updated as expected. LGTM.

Also applies to: 69-71

Deeploy/Targets/PULPOpen/Parsers.py (1)

70-96: FP Conv2D parser bias handling is clean

Square padding + optional bias inputs with clear has_bias/bias fields. LGTM.

Also applies to: 102-121

Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (1)

27-39: Output batching asymmetry warrants verification of design intent.

The unconditional output pointer bump (line 39) contrasts with conditional A/B/C bumps (lines 27–37). This pattern is consistent across multiple templates (FloatMatMulTemplate, FloatConvTemplate), suggesting intentional design rather than oversight. However, no data_out_batched flag exists in the codebase, and the codebase does not reveal where/how the input batching flags are actually computed.

Verify:

  • Whether output must always differ per batch iteration (correct design), or
  • Whether non-batched output accumulation is a valid use case that should be supported (missing guard).

If non-batched output accumulation should be supported, guard the output increment as proposed in the original comment.

TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)

10-251: LGTM—depthwise Im2Col implementation is well-structured.

The per-core channel distribution, depthwise iteration pattern (mapping input channels to output channel ranges), and bias handling are correctly implemented. Past review concerns regarding log2 usage and padding-loop type mixing have been resolved.

Based on learnings: log2 is defined in pulp_nn_utils and safe to use in PULP code.

TargetLibraries/PULPOpen/inc/kernel/Conv.h (1)

12-36: LGTM—header declarations align with implementations.

The updated signatures for bias-aware convolution and the new depthwise Im2Col function correctly match the implementations in Convolution_fp32.c and DWConvolution_fp32.c. Parameter order and types are consistent across all three function declarations.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)

21-38: LGTM—dynamic buffer sizing correctly includes all necessary factors.

The updated computeTransientBuffersSize now accounts for weight_type.typeWidth // 8, n_cores, ch_im_in, and kernel dimensions, providing accurate per-core buffer allocation. The manual type assignment in hoistTransientBuffers (lines 36-37) ensures the allocation system correctly interprets the element count as bytes.

Based on learnings: Transient buffer sizing returns element counts, and manual type assignment ensures correct byte allocation.


41-71: LGTM—depthwise Im2Col template correctly omits channel-in factor.

The PULP2DFloatDWConvIm2ColTemplate appropriately computes buffer size as NUM_CORES * P * Q (lines 52-53), omitting ch_im_in since depthwise convolution processes one channel at a time. Manual type assignment (lines 66-67) follows the established pattern from the regular conv template.

Note: The unused ctxt parameter at line 48 (flagged by static analysis) is intentional for interface consistency with the parent NodeTemplate.computeTransientBuffersSize signature.


130-160: LGTM—depthwise Im2Col template correctly invokes DW kernel.

The referenceDW2DIm2ColTemplate correctly calls PULP_DW_Conv2d_Im2Col_fp*_HWC with bias and context buffer parameters. Parameter order matches the kernel signature in Conv.h and DWConvolution_fp32.c.

Based on learnings: X,Y parameter order in PULP Im2Col calls is correct for the implementation.

Deeploy/Targets/PULPOpen/Platform.py (4)

77-77: LGTM—FP depthwise conv mapper correctly wired.

FPDWConv2DMapper correctly combines PULPFPDWConv2DParser with PULPFloatDWConv2DBindings and is properly registered in the Conv layer mapping at line 108.


92-94: LGTM—slice mappers provide both DMA and non-DMA paths.

The new SliceMapper (using PULPSliceBindings) provides a non-DMA slice path for FloatDataTypes, complementing the existing DMASliceMapper for DMA-based slicing. Both are correctly registered in the Slice layer mapping at line 134.


234-235: LGTM—empty bias removal pass integrated into optimizer.

RemoveEmptyConvBiasPass is appropriately placed after the requant merge passes, ensuring empty bias tensors are cleaned up during topology optimization.


247-254: LGTM—cluster engine now configurable with core count.

The n_cores parameter (default 8) allows per-engine core configuration, which propagates to the operatorRepresentation['n_cores'] used in template buffer sizing. This enables flexible core allocation for multi-cluster scenarios.

Deeploy/Targets/PULPOpen/Bindings.py (4)

143-164: LGTM—slice bindings correctly distinguish DMA and non-DMA paths.

PULPDMASliceBindings (line 151) uses DMASliceTemplate for DMA-based slicing of IntegerDataTypes, while the new PULPSliceBindings (lines 155-164) uses SliceTemplate for FloatDataTypes. Both use appropriate transformers (MemoryAwareForkTransformer for DMA futures, ForkTransformer for direct slicing).


166-169: LGTM—reshape bindings extended for float types and int64 shapes.

The updated PULPReshapeBindings now covers both IntegerDataTypes and FloatDataTypes, and the second pointer correctly uses int64_t to represent shape parameters (matching ONNX's int64 shape convention).


236-242: LGTM—FP depthwise conv bindings correctly reference DW Im2Col template.

PULPFloatDWConv2DBindings correctly binds ConvChecker with three PointerClass(float_type) inputs (data, weights, bias) to FloatConvTemplate.referenceDW2DIm2ColTemplate, covering all FloatDataTypes with ForkTransformer.


292-300: LGTM—ReduceMean bindings extended for cross-type support.

The cross-type bindings (lines 295-300) enable ReduceMean with float input/output and integer axes, complementing the homogeneous integer bindings (lines 293-294). Both use ClusterTransformer appropriately.

TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)

41-101: LGTM—bias-aware direct convolution correctly implemented.

The has_bias and no-bias paths are correctly separated, with bias applied at line 67 (sum + pSrcBias[f + ch_out_start]) and omitted at line 97. Per-core channel distribution and output indexing are consistent.

Note: Lines 41–70 and 71–101 duplicate ~30 lines of convolution logic, differing only in bias application. This pattern matches DWConvolution_fp32.c and is acceptable for clarity, though a helper function could reduce duplication.


140-218: LGTM—bias-aware Im2Col convolution correctly implemented.

The Im2Col function correctly fills per-core buffers from pContextBuffer (line 132) and applies bias at line 175 in the has_bias path, omitting it at line 214 in the no-bias path. Per-core weight pointer computation (line 129) and output indexing (lines 173, 212) are correct.

Note: Lines 140–178 and 179–218 duplicate ~39 lines of im2col logic, similar to the pattern in DWConvolution_fp32.c.

Copy link
Collaborator

@lukamac lukamac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for all the hard work :)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
CHANGELOG.md (1)

114-114: Restore missing "Fixed" verb (unresolved from prior review).

This entry is still missing the "Fixed" verb, despite being marked as addressed in past commits. The current text is a sentence fragment.

Apply this diff:

- Data types in the PULPOpen FP add and mul templates.
+ Fixed data types in the PULPOpen FP add and mul templates.
🧹 Nitpick comments (2)
CHANGELOG.md (2)

31-33: Reduce repetitive "FP" prefix across successive entries.

Three consecutive entries (lines 31, 32, 33) begin with "FP", creating a repetitive structure that could be streamlined for clarity.

Consider rewording one or more entries to vary the phrasing without losing specificity:

- FP conv 2D (simple & DW), reshape & skip connection, and TinyViT demo tests to the non-tiled Siracusa CI pipeline.
- FP bindings and mappings for PULP slice, DW conv 2D, and reduce mean operations.
- FP PULP DW conv lowering optimization pass similar to the existent one for integer version.
+ Conv 2D (simple & DW), reshape & skip connection, and TinyViT demo tests to the non-tiled Siracusa CI pipeline (FP).
+ Bindings and mappings for PULP slice, DW conv 2D, and reduce mean operations (floating-point).
+ FP PULP DW conv lowering optimization pass analogous to the integer version.

Alternatively, you could group these entries under a subsection to indicate they are related floating-point additions.


109-109: Use "Renamed" instead of "changed" for consistency with changelog conventions.

Changelog entries typically use action verbs like "Renamed," "Refactored," "Updated" rather than past-tense descriptions. This improves consistency with entries like line 81 ("rename of permutation functions").

Apply this diff:

- changed `_mapNode` to `_selectEngine` which reduces the responsibility of that function to, as the name states, just engine selection
+ Renamed `_mapNode` to `_selectEngine` to reduce the responsibility of that function to just engine selection.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe40743 and e84ab2e.

📒 Files selected for processing (1)
  • CHANGELOG.md (6 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • CHANGELOG.md
🪛 LanguageTool
CHANGELOG.md

[style] ~33-~33: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Something isn't working Feature Addition of new features

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

3 participants