-
Notifications
You must be signed in to change notification settings - Fork 25
TinyViT on non-tiled Siracusa #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
TinyViT on non-tiled Siracusa #117
Conversation
📝 WalkthroughSummary by CodeRabbitRelease Notes
WalkthroughAdds PULP FP depthwise 2D convolution (Im2Col) with bias support across templates, kernels, parsers, bindings and mappings; introduces slice bindings; adds n_cores propagation to PULP cluster engines and deployer; refactors engine-selection helpers (_mapNode → _selectEngine); updates templates/kernels types and Siracusa CI tests. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as Test Runner / Caller
participant Deployer as PULPDeployer
participant Engine as PULPClusterEngine
participant OpRep as OperatorRepresentation
CLI->>Deployer: run/generate (--cores)
Deployer->>Engine: annotate n_cores on engine(s)
Engine-->>Deployer: n_cores set
Deployer->>OpRep: inject n_cores into operator contexts
Deployer->>Deployer: super().bind() (uses annotated contexts)
sequenceDiagram
participant Caller as Host
participant DWKernel as PULP_DW_Conv2d_Im2Col_fp32
participant Core as per-core compute
participant Ctxt as pContextBuffer/im2col
Caller->>DWKernel: call(pSrcA,pSrcB,pSrcBias,has_bias,pDstC,pContextBuffer,pads...)
DWKernel->>Core: partition outputs by core (ch_out_start/ch_out_stop)
Core->>Ctxt: select per-core im2col buffer
Core->>Core: fill im2col, MAC over weights
alt has_bias
Core->>Core: add bias to outputs
end
Core->>Caller: write outputs to pDstC
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Areas to focus on:
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
344-346: Prevent KeyError when 'group' attr is absentDepthwise detection assumes 'group' exists. If missing, this pass will crash. Use a safe default (1) as elsewhere in the file.
- if opNode.attrs['group'] == 1: - return graph + group = opNode.attrs.get('group', 1) + if group == 1: + return graphDeeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1)
16-23: Loop index type should match buffer indexingUse unsigned 32-bit for
ito match array indexing and avoid UB on large sizes.(Handled in the diff above.)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
131-149: DW Im2Col arg order — swap X/Y for image dims, kernel dims and strides; bias placement is correct
- Prototype (TargetLibraries/PULPOpen/inc/kernel/Conv.h): PULP_DW_Conv2d_Im2Col(..., uint32_t H, uint32_t W, uint32_t C, ..., uint32_t P, uint32_t Q, uint32_t SP, uint32_t SQ, const float32_t *pSrcBias, const bool has_bias, float32_t *pDstC, uint32_t pad_top, uint32_t pad_bottom, uint32_t pad_left, uint32_t pad_right, float32_t *pContextBuffer).
- Template call (Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py lines 131–149) currently passes (dim_im_in_x, dim_im_in_y), (dim_kernel_x, dim_kernel_y), (stride_x, stride_y). These should be swapped to (dim_im_in_y, dim_im_in_x), (dim_kernel_y, dim_kernel_x), (stride_y, stride_x).
- (bias, has_bias), padding order and context buffer position are already correct.
🧹 Nitpick comments (16)
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)
10-11: Avoid libm at runtime: use integer log2 for NUM_CORESReplace double
log2()with__builtin_ctz()to avoid libm dependency and ensure compile‑time integer computation. Also assert the power‑of‑two assumption that the chunking math relies on.Apply this diff:
-uint32_t ${nodeName}_log2Core = (uint32_t) log2(NUM_CORES); +// NUM_CORES must be a power of two; compute log2(NUM_CORES) without libm +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L +_Static_assert((NUM_CORES & (NUM_CORES - 1)) == 0, "NUM_CORES must be power of two"); +#endif +uint32_t ${nodeName}_log2Core = (uint32_t)__builtin_ctz(NUM_CORES);.github/workflows/ci-platform-siracusa.yml (1)
98-98: TinyViT demo test: validate runtime envelope on SiracusaGreat to see model‑level coverage. Consider job timeout safeguards if this grows. For now, please confirm the demo’s memory/latency fit the non‑tiled target on the selected runner.
Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1)
11-21: LGTM: per-iteration A/B selection handles mixed batched/unbatched inputs.This fixes the non-batched handling cleanly and mirrors expected strides (A: MN, B: NO). Consider applying the same per-iteration pointer style in GEMM for consistency, but not blocking.
Deeploy/DeeployTypes.py (3)
320-324: Prefer None over mutable default and let ctor initializealias_of.Passing
alias_of = []creates a new list here (safe), but useNoneconsistently and centralize initialization in the constructor to avoid future in-place mutations leaking via shared references.- return (cls( - name = node.name, - shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape, - alias_of = [], - )) + return cls( + name = node.name, + shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape, + alias_of = None, + )
469-474: Avoid sharing alias list between buffers; copy it.Passing
buffer.alias_ofby reference couples future mutations between the source and newConstantBuffer.- ret = cls( + ret = cls( name = buffer.name, shape = buffer.shape, values = values, - alias_of = buffer.alias_of, + alias_of = list(buffer.alias_of), )
1176-1180: Passalias_of=Noneand let the ctor initialize.Keeps alias handling consistent and avoids accidental shared lists if ctor defaults are ever reintroduced.
- nb = ctxt.VariableBuffer( - name = name, - shape = node.shape, - alias_of = [], - ) + nb = ctxt.VariableBuffer( + name = name, + shape = node.shape, + alias_of = None, + )TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (3)
137-141: Remove unnecessary modulo and precompute local weight pointer inside inner loop
im2col_idx % (P * Q)is redundant. Precompute a per-filter base pointer for tighter inner loops.Apply this diff in both bias and non-bias paths:
- for (uint32_t im2col_idx = 0; im2col_idx < P * Q; im2col_idx++) { - sum += - im2col_buffer[im2col_idx] * - weight_ptr[(f - ch_out_start) * P * Q + im2col_idx % (P * Q)]; - } + const float32_t *local_w = weight_ptr + (f - ch_out_start) * (P * Q); + for (uint32_t k = 0; k < P * Q; k++) { + sum += im2col_buffer[k] * local_w[k]; + }Also applies to: 236-240
47-48: Remove unused variablekernel_size
kernel_sizeis computed but never used.- uint32_t kernel_size = P * Q * F_total;
61-97: Consider simplifying padding handling with a single p/q nested loopYou can mirror the im2col fill used in the regular Im2Col kernel (check Convolution_fp32.c), avoiding four separate padding loops and the type-casting hazards.
TargetLibraries/PULPOpen/inc/kernel/Conv.h (1)
20-27: DocumentpContextBuffersize expectations and bias semanticsThe new signatures look correct. Please add brief comments indicating required
pContextBuffersizes:
- Conv2d_Im2Col: NUM_CORES * (CPQ) floats
- DW_Conv2d_Im2Col: NUM_CORES * (P*Q) floats
Also note thathas_biasgoverns whetherpSrcBiasis read.Also applies to: 29-36
Deeploy/Targets/PULPOpen/Bindings.py (2)
239-245: Constrain FP DWConv binding to supported float type(s).FloatConvTemplate/referenceDW2DIm2ColTemplate likely only implements float32. Avoid advertising unsupported float types by narrowing the generator.
Apply this diff:
- ForkTransformer) for float_type in FloatDataTypes + ForkTransformer) for float_type in [float32_t]
299-302: ReduceMean (float): restrict axes integer type to int32.ONNX axes are int64/int32; templates and parsers typically operate on int32 here. Generating bindings for all signed integer widths adds binder churn without benefit.
Apply this diff:
- for integer_type in SignedIntegerDataTypes - for float_type in FloatDataTypes + for integer_type in [int32_t] + for float_type in FloatDataTypesDeeploy/Targets/PULPOpen/Platform.py (1)
108-108: Depthwise vs regular FP Conv: prevent mapper shadowing by ordering.If PULPFPConv2DParser accepts DW cases, FPConv2DMapper may shadow FPDWConv2DMapper due to precedence.
Apply this diff to prioritize DWConv first:
- 'Conv': ConvLayer([FPConv2DMapper, FPDWConv2DMapper]), + 'Conv': ConvLayer([FPDWConv2DMapper, FPConv2DMapper]),Alternatively, ensure PULPFPConv2DParser explicitly rejects depthwise.
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)
42-51: Fix return typing and silence ARG004 (unusedctxt)
- Return type now returns a
strfor size; widen the annotation to reflect this.- Rename
ctxtto_ctxtto address Ruff ARG004.Apply:
- def computeTransientBuffersSize(ctxt: NetworkContext, - operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]: + def computeTransientBuffersSize(_ctxt: NetworkContext, + operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, str, IntVar]]]:
93-93: Typo in comment: “ChannelOout” → “ChannelOut”Minor doc polish.
-// 2D FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp}) +// 2D FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})
124-126: Typo in comment: “ChannelOout” → “ChannelOut”Same typo in DW template header.
-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp}) +// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
-
.github/workflows/ci-platform-siracusa.yml(3 hunks) -
Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py(1 hunks) -
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py(2 hunks) -
Deeploy/DeeployTypes.py(4 hunks) -
Deeploy/Targets/Generic/Parsers.py(1 hunks) -
Deeploy/Targets/PULPOpen/Bindings.py(5 hunks) -
Deeploy/Targets/PULPOpen/Parsers.py(1 hunks) -
Deeploy/Targets/PULPOpen/Platform.py(7 hunks) -
Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py(1 hunks) -
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py(4 hunks) -
Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py(1 hunks) -
Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py(1 hunks) -
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py(1 hunks) -
TargetLibraries/PULPOpen/inc/DeeployPULPMath.h(1 hunks) -
TargetLibraries/PULPOpen/inc/kernel/Conv.h(1 hunks) -
TargetLibraries/PULPOpen/src/Convolution_fp32.c(4 hunks) -
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (2)
add_aliases(326-345)lookup(763-795)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
NodeTemplate(82-224)computeTransientBuffersSize(137-157)NetworkContext(554-1063)hoistTransientBuffers(160-180)hoistTransientBuffer(864-883)lookup(763-795)
TargetLibraries/PULPOpen/inc/kernel/Conv.h (2)
TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
PULP_Conv2d_fp32_fp32_fp32_HWC(10-102)PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC(104-219)TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC(10-251)
Deeploy/Targets/PULPOpen/Bindings.py (3)
Deeploy/CommonExtensions/DataTypes.py (4)
float32_t(74-78)int8_t(12-15)int32_t(24-27)uint8_t(36-39)Deeploy/Targets/Generic/TypeCheckers.py (3)
SliceChecker(37-51)ConvChecker(456-474)ReduceMeanChecker(310-324)Deeploy/AbstractDataTypes.py (1)
PointerClass(522-545)
Deeploy/Targets/PULPOpen/Platform.py (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
RemoveEmptyConvBiasPass(667-679)Deeploy/Targets/PULPOpen/Parsers.py (1)
PULPFPDWConv2DParser(122-188)Deeploy/DeeployTypes.py (1)
NodeMapper(1695-1838)Deeploy/Targets/Generic/Parsers.py (1)
SliceParser(97-155)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
inputs(2492-2509)lookup(763-795)parseNode(1076-1091)parseNodeCtxt(1094-1119)NetworkContext(554-1063)Deeploy/Targets/Generic/Parsers.py (31)
Conv2DParser(1182-1248)parseNode(19-27)parseNode(49-58)parseNode(81-94)parseNode(102-112)parseNode(163-169)parseNode(193-207)parseNode(229-251)parseNode(291-304)parseNode(326-338)parseNode(375-387)parseNode(422-426)parseNode(449-460)parseNode(484-495)parseNode(531-535)parseNode(551-554)parseNodeCtxt(29-41)parseNodeCtxt(60-76)parseNodeCtxt(114-155)parseNodeCtxt(171-185)parseNodeCtxt(209-221)parseNodeCtxt(253-283)parseNodeCtxt(306-318)parseNodeCtxt(340-367)parseNodeCtxt(389-414)parseNodeCtxt(428-441)parseNodeCtxt(462-476)parseNodeCtxt(497-523)parseNodeCtxt(537-543)parseNodeCtxt(556-573)parseNodeCtxt(586-605)
🪛 Ruff (0.13.1)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
43-43: Unused static method argument: ctxt
(ARG004)
Deeploy/DeeployTypes.py
433-433: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
433-433: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
433-433: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
🔇 Additional comments (23)
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)
13-14: LGTM: unsigned arithmetic for start/end boundsCasting
${size}touint32_taligns types and prevents signed/unsigned min mixups. Indices and loop vars are consistentlyuint32_t..github/workflows/ci-platform-siracusa.yml (2)
75-75: Reshape + skip connection test: ensure alias_of fix is coveredGood addition to exercise the alias/buffer lifetime fix. Please confirm this test indeed hits the alias_of scenario (non‑tiled) and fails without the fix.
61-64: CI workflow lists new DW conv FP tests but those tests aren't registered in the repo.github/workflows/ci-platform-siracusa.yml (lines 61–64) lists testFloat2DDWConvolution, testFloat2DDWConvolutionBias, testFloat2DDWConvolutionZeroBias but a repo search found no matches — add/register these tests or update the workflow. After they exist, verify the shapes fit non-tiled Siracusa memory and that the zero-bias test exercises the RemoveEmptyConvBias pass.
Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (2)
27-37: Good: batch-gated A/B/C strides prevent OOB on non-batched inputs.Looks correct and aligns with the matmul change. Please confirm the strides match the in-memory layout for transposed cases (i.e., still MN for A and NO for B even when transA/transB are set).
39-39: Unconditional out stride can desync when data_out aliases C but C is not batched.If data_out may alias C while C is reused (non-batched), advancing only the out pointer will diverge the pair across iterations and yield incorrect reads/writes. Either:
- enforce at mapping time that alias-of(C, data_out) implies identical batched flags, or
- gate the out advance with an explicit data_out_batched flag.
Example if such a flag is available:
- ref_${data_out}_${data_out} += ${M} * ${O}; + % if data_out_batched: + ref_${data_out}_${data_out} += ${M} * ${O}; + % endifDeeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)
370-374: Verify FP DW Conv layout: input-only transpose may be insufficientFor FP DW Conv, you transpose only the input and not the weights. Confirm kernel expectations: if weights are stored in channels-first layout, they likely also need DW-specific permutation (as above) to avoid semantic mismatch when setting channels_first to default.
385-403: New PULPDWConvPass wiring looks goodPattern graph + partial replacement is clear and consistent with the pipeline.
404-426: New PULPFPDWConvPass wiring looks goodExtends NHWC conversion to FP DW conv as intended.
505-505: Pass ordering LGTMAdding PULPFPDWConvPass after PULPDWConvPass in the sequence is reasonable and non-overlapping.
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
41-43: Confirm context buffer sizing per coreThis implementation expects
pContextBuffersized toNUM_CORES * (P*Q)floats. Please confirm templates/allocations provide that.TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
10-16: LGTM on signature extension and core partitioningBias handling and per-core channel slicing are consistent. Weight and output indexing are correct.
Please ensure all call sites updated to pass
pSrcBias,has_bias, andpContextBuffer(for the Im2Col variant).Also applies to: 18-39
139-176: Good reuse of im2col buffer and contiguous weight blocksThe inner product uses a contiguous
kernel_sizeslice, matching the buffer layout. Looks correct.Also applies to: 179-218
Deeploy/Targets/PULPOpen/Parsers.py (3)
104-116: Bias handling looks correct; maps optional third input and flags has_biasParsing and context mapping are consistent with the new kernels.
Confirm downstream templates use operatorRepresentation["has_bias"] and ["bias"] as introduced here.
127-159: DW FP2D parser shape/padding checks look reasonableDepthwise gating via inputs count and padding symmetry is fine.
185-186: Depthwise condition check aligns with ONNX semantics
group == ch_im_inis the expected DW condition.Deeploy/Targets/PULPOpen/Bindings.py (2)
143-153: DMA Slice binding switch looks good.Good move to DMASliceTemplate with MemoryAwareForkTransformer for integer tensors; aligns with the DMA path introduced in Platform.
155-165: Float Slice mapping: validate starts/ends/axes/steps typing.Using uint8_t pointers for starts/ends/axes/steps is consistent with the DMA variant, but please verify SliceTemplate.referenceTemplate expects raw byte buffers for these (not int32_t). Also confirm this path covers only float inputs while integers fall back to DMA.
Deeploy/Targets/PULPOpen/Platform.py (4)
77-79: FP DWConv mapper wiring: LGTM.Parser and bindings are correctly introduced and exposed.
92-95: Two Slice mappers ordering: LGTM; confirm fallback behavior.Order ensures float Slice binds first; integers should fall back to DMA. Please confirm CI covers both float and int slice paths.
8-9: RemoveEmptyConvBiasPass import: LGTM.Consistent with optimizer usage below.
235-236: Optimizer: adding RemoveEmptyConvBiasPass is sensible.Placement at the end of the topology passes is fine as long as it runs pre-binding. If the intent is to normalize bias presence for FP convs (so templates always see 3 inputs), consider placing it before conv-related merges; otherwise current placement is acceptable.
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)
60-62: Good: transient buffer typed to input elementEnsures the generated C sees the correct pointee type instead of
void. This helps alloc/codegen correctness.
81-81: Bias args correct — signature matches PULP_Conv2d_fp32_fp32_fp32_HWCTargetLibraries/PULPOpen/src/Convolution_fp32.c and TargetLibraries/PULPOpen/inc/kernel/Conv.h declare pSrcBias, has_bias in that exact position before pDstC/padding, so adding ${bias}, ${has_bias} is correct.
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Deeploy/DeeployTypes.py (1)
241-241: Fix remaining mutable default arguments.
VariableBuffer.__init__still has mutable defaults (shape = [1],alias_of: Optional[List[str]] = []). These should use immutable defaults (tuples orNone) to follow Python best practices and avoid shared mutable state across calls.Apply this diff:
- def __init__(self, name: str = '', shape = [1], alias_of: Optional[List[str]] = []): + def __init__(self, name: str = '', shape: Sequence[int] = (1,), alias_of: Optional[List[str]] = None):The defensive copy on line 260 already handles the
Nonecase correctly, so no further changes are needed there.Based on past review feedback and static analysis (Ruff B006).
🧹 Nitpick comments (2)
Deeploy/DeeployTypes.py (2)
454-454: Optional: Avoid redundant defensive copy.Since
VariableBuffer.__init__already defensively copiesalias_of(line 260), passinglist(alias_of)here results in double copying. You can simplify by passing the list directly:- super().__init__(name, shape, list(alias_of) if alias_of is not None else None) + super().__init__(name, shape, alias_of)This is a minor optimization and doesn't affect correctness.
490-495: Optional: Make alias propagation more explicit.While correct (the constructor defensively copies
alias_of), explicitly copying here would make the intent clearer:ret = cls( name = buffer.name, shape = buffer.shape, values = values, - alias_of = buffer.alias_of, + alias_of = list(buffer.alias_of), # Explicit copy for clarity )This is a minor clarity improvement and doesn't affect correctness.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CHANGELOG.md(4 hunks)Deeploy/DeeployTypes.py(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
Deeploy/DeeployTypes.py (1)
DeeployTest/testUtils/dmaUtils.py (1)
size(72-73)
🪛 Ruff (0.13.3)
Deeploy/DeeployTypes.py
452-452: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
452-452: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
🔇 Additional comments (4)
Deeploy/DeeployTypes.py (4)
325-329: LGTM!Correctly initializes
alias_ofwith an explicit empty list for each new instance, avoiding shared mutable state.
362-362: LGTM!Correctly returns a defensive copy of the internal alias list, preventing external code from accidentally mutating the buffer's state.
404-406: LGTM!The updates to support
sizeasUnion[int, str]are consistent acrossTransientBuffer.__init__,_bufferRepresentation, andhoistTransientBuffer. This enables symbolic size expressions, which aligns with the PR's tiling and optimization goals.Also applies to: 431-431, 888-888
1200-1204: LGTM!Correctly initializes non-global outputs with an explicit empty
alias_oflist, maintaining consistency with the alias tracking implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (2)
Deeploy/DeeployTypes.py (2)
241-261: Fix mutable defaults in VariableBuffer.init (shape/alias_of)Defaulting shape to [1] and alias_of to [] risks shared state; use immutable defaults and copy incoming lists.
Apply:
- def __init__(self, name: str = '', shape = [1], alias_of: Optional[List[str]] = []): + def __init__(self, name: str = '', shape: Sequence[int] = (1,), alias_of: Optional[List[str]] = None): self.name: str = name #: str: Canonical name that this buffer is registered as in the NetworkContext - self.shape: Sequence[ - int] = shape #: Sequence[int]: Represents the dimensions of the underlying tensor as a sequence of dimension sizes + self.shape: Sequence[int] = shape #: Sequence[int]: Represents the dimensions of the underlying tensor as a sequence of dimension sizes @@ - self.alias_of: List[str] = list(alias_of) if alias_of is not None else [] + self.alias_of: List[str] = list(alias_of) if alias_of is not None else []As per Ruff B006.
452-456: Fix mutable defaults in ConstantBuffer.init (shape/values)Use immutable defaults and forward alias_of safely.
Apply:
- def __init__(self, name: str = '', shape = [1], values = [0], alias_of: Optional[List[str]] = None): - # Pass a copy of alias_of to avoid shared references - super().__init__(name, shape, list(alias_of) if alias_of is not None else None) + def __init__(self, + name: str = '', + shape: Sequence[int] = (1,), + values: Union[Sequence, np.ndarray] = (0,), + alias_of: Optional[List[str]] = None): + super().__init__(name, shape, alias_of)(values processing via np.asarray below remains correct.)
🧹 Nitpick comments (4)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)
21-24: Im2Col buffer bytes should derive from input element width (not weight).The im2col buffer stores input elements; computing bytes via weight_type may work today (fp32), but couples unrelated types and can break if they diverge. Prefer data_in_type width.
Apply:
- im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8) * ctxt.n_cores * operatorRepresentation[ - 'ch_im_in'] * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y'] + im2col_dim = (operatorRepresentation["data_in_type"].referencedType.typeWidth // 8) \ + * ctxt.n_cores \ + * operatorRepresentation['ch_im_in'] \ + * operatorRepresentation['dim_kernel_x'] \ + * operatorRepresentation['dim_kernel_y']
43-51: Fix return type annotation and stale comment; ensure consistent sizing semantics.
- Return type says List[Tuple[str, str]] but the code returns int; align with base signature (Union[int, IntVar]).
- Comment says “value is only used as string”, but an int is produced. Update comment and/or value.
Also, consider returning element count (NUM_CORES * Kx * Ky) and relying on the buffer’s referencedType to derive bytes, for consistency with the established pattern.
- def computeTransientBuffersSize(ctxt: NetworkContext, - operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]: + def computeTransientBuffersSize(ctxt: NetworkContext, + operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, IntVar]]]: @@ - # Memory allocation for the im2col buffer can be dynamic, based on the number of cores. - # WARNING: This works because value is only used as string, in the allocate template. - im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8 - ) * ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y'] + # Im2Col buffer scales with number of cores and kernel area; size below is in ELEMENTS. + # Byte size is derived from the buffer's referenced type at allocation/codegen time. + im2col_dim = ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']Note: If you prefer to keep bytes here, update the comment accordingly and see the next comment on double-accounting risk.
124-154: DW Im2Col call wiring is consistent; minor docstring typo.The DW Im2Col invocation mirrors the FP path with bias and ctxtBuffer. Typo: “ChannelOout” → “ChannelOut”.
-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp}) +// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})Deeploy/DeeployTypes.py (1)
2708-2714: Preserve context name when recreating NetworkContext during parseRecreating self.ctxt drops custom names; pass name=self.name to keep mangling consistent.
Apply:
- self.ctxt = NetworkContext( + self.ctxt = NetworkContext( variableBuffer = self.Platform.VariableBuffer, constantBuffer = self.Platform.ConstantBuffer, structBuffer = self.Platform.StructBuffer, transientBuffer = self.Platform.TransientBuffer, - n_cores = self.ctxt.n_cores, + n_cores = self.ctxt.n_cores, + name = self.name, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py(2 hunks)Deeploy/DeeployTypes.py(15 hunks)Deeploy/Targets/PULPOpen/Deployer.py(2 hunks)Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py(5 hunks)Deeploy/TilingExtension/TilerExtension.py(1 hunks)DeeployTest/testMVP.py(4 hunks)DeeployTest/testRunner_tiled_siracusa.py(1 hunks)DeeployTest/testUtils/platformMapping.py(2 hunks)DeeployTest/testUtils/testRunner.py(3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.
Applied to files:
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
Applied to files:
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (4)
DeeployTest/testMVP.py (3)
Deeploy/DeeployTypes.py (1)
NetworkDeployer(3256-3633)DeeployTest/testUtils/platformMapping.py (1)
mapDeployer(92-261)DeeployTest/testSchedulingExtension.py (1)
_mockScheduler(29-33)
DeeployTest/testRunner_tiled_siracusa.py (1)
DeeployTest/testUtils/testRunner.py (1)
TestRunner(279-450)
DeeployTest/testUtils/platformMapping.py (2)
Deeploy/DeeployTypes.py (3)
DeploymentPlatform(2451-2494)TopologyOptimizer(2249-2278)NetworkDeployer(3256-3633)Deeploy/Targets/PULPOpen/Deployer.py (1)
PULPDeployer(29-128)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
NodeTemplate(87-229)computeTransientBuffersSize(142-162)NetworkContext(575-1090)hoistTransientBuffers(165-185)hoistTransientBuffer(891-910)lookup(790-822)
🪛 GitHub Actions: CI • Snitch (Tiled)
DeeployTest/testUtils/testRunner.py
[error] 334-334: AttributeError: 'TestRunner' object has no attribute '_cores'. Did you mean: 'n_cores'?
[error] 334-334: Command failed with exit code 1. Failure occurred while executing testRunner_tiled_snitch.py with parameters: -t Tests/$testName --cores=9 --simulator=$simulator --l1 $L1_value --defaultMemLevel=L2 --memAllocStrategy=MiniMalloc --searchStrategy=random-max
🪛 Ruff (0.13.3)
DeeployTest/testMVP.py
58-58: Unused function argument: verbose
(ARG001)
Deeploy/Targets/PULPOpen/Deployer.py
40-40: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
Deeploy/DeeployTypes.py
452-452: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
452-452: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py
25-25: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (150)
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
🔇 Additional comments (6)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)
81-84: Bias/has_bias plumbing for direct conv looks good.Bias flags are correctly passed to the FP Conv2d HWC call.
100-117: Im2Col call argument order (X,Y) acknowledged as correct for this kernel.The X,Y ordering for dims, kernel, and stride matches the intended prototype in this codebase.
If you changed any kernel prototypes elsewhere in this PR, ensure consistency across integer/float and DW/non-DW variants.
Deeploy/TilingExtension/TilerExtension.py (1)
987-989: LGTM: propagate core count to contextAssigning ctxt.n_cores ensures downstream kernels see the configured core count.
DeeployTest/testRunner_tiled_siracusa.py (1)
20-26: LGTM: cores threaded into TestRunner and CMakePassing args.cores to TestRunner and exporting NUM_CORES is correct.
Also applies to: 28-28
DeeployTest/testUtils/platformMapping.py (1)
92-104: LGTM: add n_cores and forward to PULPDeployerSignature and forwarding align with deployer changes.
Also applies to: 214-224
DeeployTest/testMVP.py (1)
57-60: LGTM: introduce and propagate n_coresCLI option, setupDeployer signature, and forwarding to mapDeployer are consistent.
Also applies to: 84-93, 201-206, 261-263
lukamac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a first round of review. There is already a few comments that need your answers. I haven't managed to look at the actual templates/kernels. I'll do that tomorrow.
Good work :)
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Outdated
Show resolved
Hide resolved
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Outdated
Show resolved
Hide resolved
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Deeploy/Targets/PULPOpen/Deployer.py (1)
31-53: Avoid the shared mutable default forinputOffsetsKeeping
inputOffsets = {}means every deployer instance shares the same dict, so one run can mutate another’s offsets. Since this signature just changed, please switch to a safeNonedefault and materialize a new dict per instance before callingsuper().-from typing import Callable, Dict, List, Type +from typing import Callable, Dict, List, Optional, Type @@ - inputOffsets = {}, + inputOffsets: Optional[Dict[str, int]] = None, n_cores: int = 8): + if inputOffsets is None: + inputOffsets = {} + super().__init__( graph = graph, deploymentPlatform = deploymentPlatform, inputTypes = inputTypes, loweringOptimizer = loweringOptimizer,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py(1 hunks)Deeploy/DeeployTypes.py(15 hunks)Deeploy/Targets/Generic/Parsers.py(1 hunks)Deeploy/Targets/PULPOpen/Deployer.py(1 hunks)Deeploy/Targets/PULPOpen/Parsers.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
🧰 Additional context used
🧬 Code graph analysis (2)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (3)
add_aliases(331-350)get_aliases_of(352-364)lookup(788-820)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
inputs(2579-2596)lookup(788-820)parseNode(1101-1116)parseNodeCtxt(1119-1144)NetworkContext(574-1088)Deeploy/Targets/Generic/Parsers.py (31)
Conv2DParser(1204-1270)parseNode(19-27)parseNode(49-58)parseNode(81-94)parseNode(102-112)parseNode(163-169)parseNode(193-207)parseNode(229-251)parseNode(291-304)parseNode(326-338)parseNode(375-387)parseNode(422-426)parseNode(449-460)parseNode(484-495)parseNode(531-535)parseNode(551-554)parseNodeCtxt(29-41)parseNodeCtxt(60-76)parseNodeCtxt(114-155)parseNodeCtxt(171-185)parseNodeCtxt(209-221)parseNodeCtxt(253-283)parseNodeCtxt(306-318)parseNodeCtxt(340-367)parseNodeCtxt(389-414)parseNodeCtxt(428-441)parseNodeCtxt(462-476)parseNodeCtxt(497-523)parseNodeCtxt(537-543)parseNodeCtxt(556-573)parseNodeCtxt(586-605)
🪛 Ruff (0.14.0)
Deeploy/Targets/PULPOpen/Deployer.py
40-40: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py(4 hunks)Deeploy/Targets/Generic/Parsers.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- Deeploy/Targets/Generic/Parsers.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-24T11:56:35.781Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py:364-0
Timestamp: 2025-09-24T11:56:35.781Z
Learning: When reviewing pull requests, focus comments only on code that was actually changed (lines with + or - in diffs), not on unchanged context code that appears in the diff.
Applied to files:
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
Applied to files:
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
🧬 Code graph analysis (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (2)
contextagnostic(285-298)ReplaceSequentialPatternPass(265-282)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (120)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
🔇 Additional comments (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)
250-250: LGTM! Improved group attribute handling.Using
.get('group', 1)properly handles the ONNX Conv default and is more robust than explicit attribute checking.
370-373: LGTM! Correctly handles FP DW Conv input transpose.The else block appropriately transposes the input for non-RequantizedConv DW convolutions (e.g., FP Conv), while the RequantizedConv path transposes weights. This distinction aligns with the requirements for adding FP DW Conv support.
380-426: LGTM! Well-structured pass classes with clear separation of concerns.The refactoring of
PULPDWConvPassimproves readability with explicit comments and sections. The newPULPFPDWConvPassclass properly mirrors the structure for FP DW Conv handling, using the 'Conv' op pattern and a distinct pass name while sharing the same replacement function viapartial().
505-505: LGTM! Correctly integrates the FP DW Conv pass into the sequence.The placement after
PULPDWConvPassand before dense conv passes is logical, ensuring DW convolutions (both quantized and FP) are handled before dense convolutions.
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Outdated
Show resolved
Hide resolved
bf5a286 to
85edf76
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 20
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py (2)
105-107: Compute NSize based on transA.Current code assumes no transpose; wrong K/N leads to bad tiles.
- NSize = buffA.shape[-1] + NSize = buffA.shape[-1] if transA == 0 else buffA.shape[-2]
215-223: Critical undefined identifiers remain inIntGEMMTileConstraint.serializeTilingSolution.The verification reveals multiple undefined variables that will cause runtime errors:
- Line 132:
BSizeused but never defined (should beBatchSize)- Lines 151, 155, 167, 171, 370, 374, 386, 390, 405, 407:
BatchOffsetused but never defined- Lines 156, 172, 375, 391, 407:
BOffsetused but never definedIn contrast,
FloatGEMMTileConstraint.serializeTilingSolutioncorrectly usesBatchSizeat line 341, suggesting theIntGEMMTileConstraintversion requires similar fixes. The undefinedBatchOffsetandBOffsetoffsets need to be extracted fromcube.offsetsimilar to howMOffsetandOOffsetare extracted at line 119.TargetLibraries/PULPOpen/src/Convolution_fp32.c (1)
22-28: Avoid 16-bit overflow in channel partitioning.Use 32-bit for ch_out_* to support large F_total; identical change recommended in both functions.
Apply:
- uint16_t ch_out_chunk = + uint32_t ch_out_chunk = (F_total >> log2Core) + ((F_total & (NUM_CORES - 1)) != 0); - uint16_t ch_out_start = MIN(ch_out_chunk * core_id, F_total); - uint16_t ch_out_stop = MIN(ch_out_start + ch_out_chunk, F_total); - uint16_t ch_out_count = ch_out_stop - ch_out_start; + uint32_t ch_out_start = MIN(ch_out_chunk * core_id, F_total); + uint32_t ch_out_stop = MIN(ch_out_start + ch_out_chunk, F_total); + uint32_t ch_out_count = ch_out_stop - ch_out_start;Repeat the same replacement in PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC.
Deeploy/MemoryLevelExtension/NetworkDeployers/MemoryLevelDeployer.py (1)
119-125: Mutable default list for passes.Use None and initialize to avoid shared state across instances.
Apply:
- memoryLevelAnnotationPasses: List[NetworkOptimizationPass] = []): + memoryLevelAnnotationPasses: List[NetworkOptimizationPass] | None = None): @@ - if len(memoryLevelAnnotationPasses) == 0: + if not memoryLevelAnnotationPasses: memoryLevelAnnotationPasses.append(AnnotateDefaultMemoryLevel(self.Platform.memoryHierarchy)) self.memoryLevelAnnotationOptimizer = NetworkOptimizer(memoryLevelAnnotationPasses)Repeat the same pattern in MemoryLevelAwareSignPropDeployer.init and MemoryDeployerWrapper.init.
♻️ Duplicate comments (5)
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)
25-26: n_cores addition looks reasonable; mutable default still present.The
n_coresparameter addition aligns with the PR's goal of supporting dynamic im2col buffer sizing. However, two observations:
- The mutable default
inputOffsets: Dict[str, int] = {}remains unfixed from the previous review.- The
n_coresparameter lacks documentation and validation (e.g., ensuring it's positive).Consider adding a docstring or inline comment explaining the
n_coresparameter's purpose and optionally validate it:inputOffsets: Dict[str, int] = {}, - n_cores: int = 8): + n_cores: int = 8): # Number of cores for im2col buffer sizing + if n_cores <= 0: + raise ValueError(f"n_cores must be positive, got {n_cores}")For the mutable default issue, refer to the previous review comment suggesting
inputOffsets: Optional[Dict[str, int]] = None.Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
43-51: Fix DW Im2Col buffer sizing: remove typeWidth multiplier and correct return type annotationIn
PULP2DFloatDWConvIm2ColTemplate.computeTransientBuffersSize()(lines 43–51), the size computation multiplies bytypeWidth // 8to produce byte count, buthoistTransientBuffers()(lines 59–61) then assigns the buffer a concrete type. The allocator subsequently multiplies bysizeof(type), causing double-accounting on tight L2 budgets. Additionally, the return type annotation incorrectly declaresstrinstead ofUnion[int, IntVar].Align with the existing
PULP2DFloatConvIm2ColTemplatepattern (lines 18–24): compute element count (no typeWidth division) and let type-setting handle the byte multiplication.- def computeTransientBuffersSize(ctxt: NetworkContext, - operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]: + def computeTransientBuffersSize(ctxt: NetworkContext, + operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, IntVar]]]: - im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8 - ) * ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y'] + im2col_dim = ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
61-97: Critical: signed/unsigned mix in im2col padding loops leads to out‑of‑bounds indexingh_in/w_in are uint32_t while h_in_start/w_in_start are int32_t; expressions like (h_in - h_in_start) and (w_in - w_in_start) perform unsigned arithmetic when mixed, producing incorrect large indices. Use int32_t consistently and cast bounds.
Apply (pattern applies similarly to LEFT/RIGHT blocks):
- for (uint32_t h_in = MAX(H, h_in_start); h_in < h_in_start + P; h_in++) { + for (int32_t h_in = MAX((int32_t)H, h_in_start); h_in < (int32_t)(h_in_start + (int32_t)P); h_in++) { for (int32_t w_in = (int32_t)w_in_start; w_in < (int32_t)(w_in_start + Q); w_in++) { im2col_buffer[(h_in - h_in_start) * Q + (w_in - w_in_start)] = 0.0f; } } - for (uint32_t h_in = MAX(0, h_in_start); h_in < MIN(H, h_in_start + P); h_in++) { + for (int32_t h_in = MAX(0, h_in_start); h_in < MIN((int32_t)H, h_in_start + (int32_t)P); h_in++) { for (int32_t w_in = (int32_t)w_in_start; w_in < MIN(0, (int32_t)(w_in_start + Q)); w_in++) { im2col_buffer[(h_in - h_in_start) * Q + (w_in - w_in_start)] = 0.0f; } } - for (uint32_t h_in = MAX(0, h_in_start); h_in < MIN(H, h_in_start + P); h_in++) { + for (int32_t h_in = MAX(0, h_in_start); h_in < MIN((int32_t)H, h_in_start + (int32_t)P); h_in++) { - for (uint32_t w_in = MAX(W, w_in_start); w_in < w_in_start + Q; w_in++) { + for (int32_t w_in = MAX((int32_t)W, w_in_start); w_in < (int32_t)(w_in_start + (int32_t)Q); w_in++) { im2col_buffer[(h_in - h_in_start) * Q + (w_in - w_in_start)] = 0.0f; } }Repeat the same signed fixes in the no‑bias branch (lines 170‑195).
Deeploy/Targets/PULPOpen/Deployer.py (1)
40-53: Avoid mutable default for inputOffsets.Switch to Optional and init when calling super.
Apply:
-from typing import Callable, Dict, List, Type +from typing import Callable, Dict, List, Optional, Type @@ - deeployStateDir: str = "DeeployStateDir", - inputOffsets = {}, + deeployStateDir: str = "DeeployStateDir", + inputOffsets: Optional[Dict[str, int]] = None, n_cores: int = 8): super().__init__( @@ - inputOffsets = inputOffsets, + inputOffsets = inputOffsets if inputOffsets is not None else {}, n_cores = n_cores, )If SignPropDeployer already accepts Optional[Dict[str, int]], you can pass it through directly and drop the or {}.
DeeployTest/testUtils/testRunner.py (1)
349-356: Remove duplicate command construction; keep single source of truth.The first assignment is redundant; append -n_cores via self.n_cores once.
Apply:
- command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform} {self.gen_args}" - - command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}" + command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}" if self._tiling is True: command += f" -n_cores {self.n_cores}" command += f" {self.gen_args}"
🧹 Nitpick comments (15)
CHANGELOG.md (1)
26-32: Vary sentence structure in Added section to improve readability.The static analysis tool flagged repetitive sentence beginnings ("FP PULP DW conv", "PULP FP DW conv", "FP bindings"). While the entries are accurate and comprehensive, consider rephrasing for better variety. For example:
- PULP 2D FP DW conv Im2Col template and kernel, with bias support. - Bias support for PULP 2D FP regular conv Im2Col in template & kernel. - PULP FP DW conv 2D parser. - FP conv 2D (simple & DW), reshape & skip connection, and TinyViT demo tests to the non-tiled Siracusa CI pipeline. - FP bindings and mappings for PULP slice, DW conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass similar to the existent one for integer version. + 2D FP depthwise convolution Im2Col template, kernel, and parser for PULP, with bias support. + Bias support for PULP 2D FP regular convolution Im2Col template and kernel. + FP convolution 2D (regular and depthwise), reshape, skip connection, and TinyViT demo tests in the non-tiled Siracusa CI pipeline. + FP bindings and mappings: PULP slice, depthwise convolution 2D, and reduce mean operations. + Depthwise convolution lowering optimization pass (FP variant) for PULP, analogous to the integer version.This maintains factual accuracy while improving stylistic variety.
Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py (1)
519-526: Tighten schedule pairing with zip(strict=True)Ensure 1:1 correspondence between base input loads and requant cubes.
- } for schedule, requant in zip(tilingSchedule.inputLoadSchedule, inputRequantCubes)] + } for schedule, requant in zip(tilingSchedule.inputLoadSchedule, inputRequantCubes, strict=True)]Deeploy/Targets/Neureka/TileConstraints/NeurekaDepthwiseConstraint.py (1)
515-517: Add zip(strict=True) to guard schedule alignmentMatch counts between input and requant loads for WMEM DW RQS.
- } for schedule, requant in zip(tilingSchedule.inputLoadSchedule, inputRequantCubes)] + } for schedule, requant in zip(tilingSchedule.inputLoadSchedule, inputRequantCubes, strict=True)]Deeploy/Targets/Neureka/TileConstraints/NeurekaPointwiseConstraint.py (1)
295-299: Strengthen pairing with zip(strict=True) in PW RQSGuarantee equal lengths when combining base and requant schedules.
- } for load, rqLoad in zip(tilingSchedule.inputLoadSchedule, requantSchedule)] + } for load, rqLoad in zip(tilingSchedule.inputLoadSchedule, requantSchedule, strict=True)]Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)
10-14: Avoid FP log2; use bit ops and assert power-of-two NUM_CORESlog2() pulls in libm and risks FP rounding; use CTZ and add a compile-time check. Also removes any need for <math.h>/-lm.
-uint32_t ${nodeName}_core_id = pi_core_id(); -uint32_t ${nodeName}_log2Core = (uint32_t) log2(NUM_CORES); +uint32_t ${nodeName}_core_id = pi_core_id(); +_Static_assert((NUM_CORES & (NUM_CORES - 1)) == 0, "NUM_CORES must be a power of two"); +const uint32_t ${nodeName}_log2Core = __builtin_ctz(NUM_CORES);Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)
93-93: Fix typo in comment: “ChannelOout” → “ChannelOut”-// 2D FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp}) +// 2D FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})
125-154: DW Im2Col call: argument order LGTM; minor comment typo-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp}) +// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})Based on learnings.
46-51: Clarify misleading comment about size-as-stringThe value is currently computed as int, not string; please update the comment to avoid confusion.
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)
181-185: Use wider unsigned types for tiling sizes for consistency and headroom.Dims rarely fit in int8; GEMM uses uint16_t/uint8_t. Align MatMul for consistency and to avoid overflow.
Apply:
-from Deeploy.CommonExtensions.DataTypes import int8_t +from Deeploy.CommonExtensions.DataTypes import uint8_t, uint16_t ... - replacementTypes = { - "M": PointerClass(int8_t), - "N": PointerClass(int8_t), - "O": PointerClass(int8_t), - "batch": PointerClass(int8_t) - } + replacementTypes = { + "M": PointerClass(uint16_t), + "N": PointerClass(uint16_t), + "O": PointerClass(uint16_t), + "batch": PointerClass(uint8_t), + }If other MatMul code relies on int8_t here, confirm and we can gate this by target or keep uint16_t only for sizes ≥ 256.
Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py (1)
83-87: Avoid assigning unused variable.
modulusis unused. Either drop assignment or use_ =.- modulus = tilerModel.addTileSizeDivisibleConstraint(parseDict, 'O', BSecondDimVar, 16, prefix = "16_") + tilerModel.addTileSizeDivisibleConstraint(parseDict, 'O', BSecondDimVar, 16, prefix = "16_")Deeploy/Targets/Generic/Parsers.py (1)
1071-1101: Reshape aliasing looks good; mark constant shape as non-deploy and use cached alias list
- Alias flow is correct and symmetric; iterate the cached list to avoid relying on a second lookup.
- If the shape input is a constant, mark it non‑live/non‑deploy to avoid useless allocations (mirrors Unsqueeze).
Apply near this block:
- new_output_node_aliases = input_node.get_aliases_of() + new_output_node_aliases = list(input_node.get_aliases_of()) new_output_node_aliases.append(input_node.name) @@ - for alias in output_node.get_aliases_of(): + for alias in new_output_node_aliases: alias_node = ctxt.lookup(alias) alias_node.add_aliases(aliases_to_add = [ output_node.name, ]) + + # If shape is a constant tensor, exclude it from deployment + shape_buf = ctxt.lookup(node.inputs[inputs.index("shape")].name) + if hasattr(shape_buf, "values"): + shape_buf._live = False + shape_buf._deploy = FalseTargetLibraries/PULPOpen/src/DWConvolution_fp32.c (2)
137-145: Minor: remove redundant modulo and consider precomputing weight baseim2col_idx runs [0, PQ), so % (PQ) is redundant. Also, precompute (f - ch_out_start)PQ outside the inner loop.
- for (uint32_t im2col_idx = 0; im2col_idx < P * Q; im2col_idx++) { - sum += im2col_buffer[im2col_idx] * - weight_ptr[(f - ch_out_start) * P * Q + im2col_idx % (P * Q)]; - } + uint32_t w_base = (f - ch_out_start) * P * Q; + for (uint32_t im2col_idx = 0; im2col_idx < P * Q; im2col_idx++) { + sum += im2col_buffer[im2col_idx] * weight_ptr[w_base + im2col_idx]; + }
47-51: Nit: kernel_size is unusedkernel_size is computed but never used; remove to avoid confusion.
Deeploy/DeeployTypes.py (1)
3262-3317: Redundant n_cores field in child; base already stores it.NetworkDeployer saves n_cores; removing the extra self.n_cores avoids duplication.
Apply:
- self.prepared = False - self.n_cores = n_cores + self.prepared = FalseDeeployTest/testMVP.py (1)
57-59: Consider removing the unusedverboseparameter.The
verboseparameter is declared but never used in the function body. While then_coresparameter addition looks good and aligns with the PR objectives, consider removingverboseto clean up the signature.Apply this diff:
def setupDeployer(graph: gs.Graph, memoryHierarchy: MemoryHierarchy, defaultTargetMemoryLevel: MemoryLevel, - defaultIoMemoryLevel: MemoryLevel, verbose: CodeGenVerbosity, args: argparse.Namespace, - n_cores: int) -> Tuple[NetworkDeployer, bool]: + defaultIoMemoryLevel: MemoryLevel, args: argparse.Namespace, + n_cores: int) -> Tuple[NetworkDeployer, bool]:Note: Ensure to update the call site on line 256 as well if you apply this change.
Based on learnings
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (30)
.github/workflows/ci-platform-siracusa.yml(3 hunks)CHANGELOG.md(4 hunks)Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py(1 hunks)Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py(2 hunks)Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py(5 hunks)Deeploy/DeeployTypes.py(14 hunks)Deeploy/MemoryLevelExtension/NetworkDeployers/MemoryLevelDeployer.py(4 hunks)Deeploy/Targets/Generic/Parsers.py(1 hunks)Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py(1 hunks)Deeploy/Targets/Neureka/TileConstraints/NeurekaDepthwiseConstraint.py(1 hunks)Deeploy/Targets/Neureka/TileConstraints/NeurekaPointwiseConstraint.py(1 hunks)Deeploy/Targets/PULPOpen/Bindings.py(6 hunks)Deeploy/Targets/PULPOpen/Deployer.py(1 hunks)Deeploy/Targets/PULPOpen/Parsers.py(2 hunks)Deeploy/Targets/PULPOpen/Platform.py(7 hunks)Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py(5 hunks)Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py(3 hunks)Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py(1 hunks)Deeploy/TilingExtension/TilerExtension.py(1 hunks)DeeployTest/testMVP.py(4 hunks)DeeployTest/testRunner_tiled_siracusa.py(1 hunks)DeeployTest/testUtils/platformMapping.py(2 hunks)DeeployTest/testUtils/testRunner.py(3 hunks)TargetLibraries/PULPOpen/inc/kernel/Conv.h(1 hunks)TargetLibraries/PULPOpen/src/Convolution_fp32.c(4 hunks)TargetLibraries/PULPOpen/src/DWConvolution_fp32.c(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py
- Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py
- Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py
- Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
- DeeployTest/testUtils/platformMapping.py
- DeeployTest/testRunner_tiled_siracusa.py
- TargetLibraries/PULPOpen/inc/kernel/Conv.h
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-09-24T12:53:00.168Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: TargetLibraries/PULPOpen/src/DWConvolution_fp32.c:0-0
Timestamp: 2025-09-24T12:53:00.168Z
Learning: In PULP platform code, log2 is defined in pulp_nn_utils and can be used safely without requiring math.h or causing floating-point build/runtime issues, unlike the standard library log2 function.
Applied to files:
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
Applied to files:
TargetLibraries/PULPOpen/src/DWConvolution_fp32.cTargetLibraries/PULPOpen/src/Convolution_fp32.cDeeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.pyDeeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-24T11:56:35.781Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py:364-0
Timestamp: 2025-09-24T11:56:35.781Z
Learning: When reviewing pull requests, focus comments only on code that was actually changed (lines with + or - in diffs), not on unchanged context code that appears in the diff.
Applied to files:
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
📚 Learning: 2025-09-24T11:43:47.236Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.
Applied to files:
.github/workflows/ci-platform-siracusa.yml
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.
Applied to files:
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (14)
Deeploy/Targets/PULPOpen/Platform.py (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
RemoveEmptyConvBiasPass(587-592)Deeploy/Targets/PULPOpen/Parsers.py (2)
PULPFPConv2DParser(65-120)PULPFPDWConv2DParser(123-189)Deeploy/DeeployTypes.py (1)
NodeMapper(1736-1892)Deeploy/Targets/Generic/Parsers.py (1)
SliceParser(97-155)
Deeploy/Targets/PULPOpen/Bindings.py (5)
Deeploy/CommonExtensions/DataTypes.py (4)
float32_t(74-78)int8_t(12-15)int32_t(24-27)uint8_t(36-39)Deeploy/DeeployTypes.py (3)
CodeTransformation(2366-2400)NodeBinding(1588-1733)NodeTemplate(87-229)Deeploy/FutureExtension/Bindings/AutoFutureBinding.py (1)
AutoFutureBinding(13-62)Deeploy/Targets/Generic/TypeCheckers.py (3)
SliceChecker(37-51)ConvChecker(456-474)ReduceMeanChecker(310-324)Deeploy/AbstractDataTypes.py (1)
PointerClass(536-559)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
inputs(2583-2600)lookup(792-824)parseNode(1105-1120)parseNodeCtxt(1123-1148)NetworkContext(578-1092)Deeploy/Targets/Generic/Parsers.py (31)
Conv2DParser(1246-1312)parseNode(19-27)parseNode(49-58)parseNode(81-94)parseNode(102-112)parseNode(163-169)parseNode(193-207)parseNode(229-243)parseNode(271-293)parseNode(333-351)parseNode(373-385)parseNode(422-434)parseNode(469-473)parseNode(496-507)parseNode(531-542)parseNode(578-582)parseNodeCtxt(29-41)parseNodeCtxt(60-76)parseNodeCtxt(114-155)parseNodeCtxt(171-185)parseNodeCtxt(209-221)parseNodeCtxt(245-263)parseNodeCtxt(295-325)parseNodeCtxt(353-365)parseNodeCtxt(387-414)parseNodeCtxt(436-461)parseNodeCtxt(475-488)parseNodeCtxt(509-523)parseNodeCtxt(544-570)parseNodeCtxt(584-590)parseNodeCtxt(603-620)
Deeploy/Targets/Neureka/TileConstraints/NeurekaPointwiseConstraint.py (2)
Deeploy/DeeployTypes.py (1)
lookup(792-824)Deeploy/TilingExtension/TilingCodegen.py (2)
HyperRectangle(24-35)calculateFlatOffsetInBytes(239-242)
Deeploy/DeeployTypes.py (3)
Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (1)
sizeInBytes(181-182)Deeploy/AbstractDataTypes.py (2)
typeWidth(399-400)Pointer(303-388)DeeployTest/testUtils/dmaUtils.py (1)
size(72-73)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (3)
lookup(792-824)get_aliases_of(352-364)add_aliases(331-350)
Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py (6)
Deeploy/TilingExtension/TilingCodegen.py (5)
TilingSchedule(53-122)AbsoluteHyperRectangle(39-49)VariableReplacementScheme(126-158)HyperRectangle(24-35)calculateFlatOffsetInBytes(239-242)Deeploy/TilingExtension/TileConstraint.py (3)
TileConstraint(19-221)serializeTilingSolution(184-221)extractBaseAddr(56-74)Deeploy/DeeployTypes.py (1)
lookup(792-824)Deeploy/AbstractDataTypes.py (1)
PointerClass(536-559)Deeploy/CommonExtensions/DataTypes.py (3)
uint8_t(36-39)uint32_t(48-51)uint16_t(42-45)Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (2)
Conv2DTileConstraint(225-435)computeInputCube(320-347)
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)
Deeploy/TilingExtension/TilingCodegen.py (1)
HyperRectangle(24-35)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (3)
contextagnostic(285-298)SequentialPass(201-210)ReplaceSequentialPatternPass(265-282)
Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py (1)
Deeploy/TilingExtension/TilingCodegen.py (1)
HyperRectangle(24-35)
Deeploy/Targets/Neureka/TileConstraints/NeurekaDepthwiseConstraint.py (6)
Deeploy/TilingExtension/TilingCodegen.py (5)
TilingSchedule(53-122)AbsoluteHyperRectangle(39-49)VariableReplacementScheme(126-158)HyperRectangle(24-35)calculateFlatOffsetInBytes(239-242)Deeploy/TilingExtension/TileConstraint.py (2)
TileConstraint(19-221)extractBaseAddr(56-74)Deeploy/TilingExtension/TilerModel.py (3)
TilerModel(34-402)addTensorDimToModel(143-157)getTensorDimVar(131-135)Deeploy/AbstractDataTypes.py (1)
PointerClass(536-559)Deeploy/CommonExtensions/DataTypes.py (3)
uint8_t(36-39)uint32_t(48-51)uint16_t(42-45)Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (2)
Conv2DTileConstraint(225-435)computeInputCube(320-347)
Deeploy/MemoryLevelExtension/NetworkDeployers/MemoryLevelDeployer.py (1)
Deeploy/DeeployTypes.py (5)
_parseNode(2670-2677)ONNXLayer(1895-2223)NetworkContext(578-1092)optimize(2260-2280)optimize(2310-2329)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
NodeTemplate(87-229)computeTransientBuffersSize(142-162)NetworkContext(578-1092)hoistTransientBuffers(165-185)hoistTransientBuffer(893-912)lookup(792-824)
DeeployTest/testMVP.py (3)
Deeploy/DeeployTypes.py (1)
NetworkDeployer(3258-3635)DeeployTest/testUtils/platformMapping.py (1)
mapDeployer(92-261)DeeployTest/testSchedulingExtension.py (1)
_mockScheduler(29-33)
🪛 LanguageTool
CHANGELOG.md
[style] ~31-~31: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 Ruff (0.14.1)
Deeploy/Targets/Neureka/TileConstraints/NeurekaPointwiseConstraint.py
233-233: Undefined name absoluteCube
(F821)
Deeploy/DeeployTypes.py
401-401: Redefinition of unused sizeInBytes from line 390
(F811)
447-447: Redefinition of unused sizeInBytes from line 444
(F811)
456-456: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
456-456: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py
263-263: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
263-263: Undefined name inputRequantCubes
(F821)
281-281: Local variable dilation is assigned to but never used
Remove assignment to unused variable dilation
(F841)
289-289: Local variable inputChannelVar is assigned to but never used
Remove assignment to unused variable inputChannelVar
(F841)
294-294: Local variable outputChannelVar is assigned to but never used
Remove assignment to unused variable outputChannelVar
(F841)
317-317: Local variable weightBuffer is assigned to but never used
Remove assignment to unused variable weightBuffer
(F841)
320-320: Local variable inputBatchVar is assigned to but never used
Remove assignment to unused variable inputBatchVar
(F841)
325-325: Local variable outputBatchVar is assigned to but never used
Remove assignment to unused variable outputBatchVar
(F841)
326-326: Local variable outputHeightVar is assigned to but never used
Remove assignment to unused variable outputHeightVar
(F841)
327-327: Local variable outputWidthVar is assigned to but never used
Remove assignment to unused variable outputWidthVar
(F841)
328-328: Local variable outputChannelVar is assigned to but never used
Remove assignment to unused variable outputChannelVar
(F841)
331-331: Local variable padding is assigned to but never used
Remove assignment to unused variable padding
(F841)
413-413: Unpacked variable BatchOffset is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
413-413: Unpacked variable HOffset is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
413-413: Unpacked variable WOffset is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
413-413: Unpacked variable COffset is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
414-414: Unpacked variable BatchSize is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
426-426: Unpacked variable inBSize is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
426-426: Unpacked variable inHSize is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
522-522: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py
122-122: Undefined name BSize
(F821)
133-133: Consider (BatchOffset, *AMatrixOffsets) instead of concatenation
Replace with (BatchOffset, *AMatrixOffsets)
(RUF005)
133-133: Undefined name BatchOffset
(F821)
133-133: Consider (BatchSize, *AMatrixShape) instead of concatenation
Replace with (BatchSize, *AMatrixShape)
(RUF005)
136-139: Consider (BatchOffset, BOffset, *AMatrixOffsets) instead of concatenation
Replace with (BatchOffset, BOffset, *AMatrixOffsets)
(RUF005)
137-137: Undefined name BatchOffset
(F821)
138-138: Undefined name BOffset
(F821)
140-143: Consider (BatchSize, BSize, *AMatrixShape) instead of concatenation
Replace with (BatchSize, BSize, *AMatrixShape)
(RUF005)
142-142: Undefined name BSize
(F821)
149-149: Consider (BatchOffset, *BMatrixOffsets) instead of concatenation
Replace with (BatchOffset, *BMatrixOffsets)
(RUF005)
149-149: Undefined name BatchOffset
(F821)
149-149: Consider (BatchSize, *BMatrixShape) instead of concatenation
Replace with (BatchSize, *BMatrixShape)
(RUF005)
152-155: Consider (BatchOffset, BOffset, *BMatrixOffsets) instead of concatenation
Replace with (BatchOffset, BOffset, *BMatrixOffsets)
(RUF005)
153-153: Undefined name BatchOffset
(F821)
154-154: Undefined name BOffset
(F821)
156-159: Consider (BatchSize, BSize, *BMatrixShape) instead of concatenation
Replace with (BatchSize, BSize, *BMatrixShape)
(RUF005)
158-158: Undefined name BSize
(F821)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
305-306: Expected an indented block after if statement
(invalid-syntax)
310-310: Expected an indented block after if statement
(invalid-syntax)
Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py
132-132: Undefined name BSize
(F821)
151-151: Consider (BatchOffset, *AMatrixOffsets) instead of concatenation
Replace with (BatchOffset, *AMatrixOffsets)
(RUF005)
151-151: Undefined name BatchOffset
(F821)
151-151: Consider (BatchSize, *AMatrixShape) instead of concatenation
Replace with (BatchSize, *AMatrixShape)
(RUF005)
154-157: Consider (BatchOffset, BOffset, *AMatrixOffsets) instead of concatenation
Replace with (BatchOffset, BOffset, *AMatrixOffsets)
(RUF005)
155-155: Undefined name BatchOffset
(F821)
156-156: Undefined name BOffset
(F821)
158-161: Consider (BatchSize, BSize, *AMatrixShape) instead of concatenation
Replace with (BatchSize, BSize, *AMatrixShape)
(RUF005)
160-160: Undefined name BSize
(F821)
167-167: Consider (BatchOffset, *BMatrixOffsets) instead of concatenation
Replace with (BatchOffset, *BMatrixOffsets)
(RUF005)
167-167: Undefined name BatchOffset
(F821)
167-167: Consider (BatchSize, *BMatrixShape) instead of concatenation
Replace with (BatchSize, *BMatrixShape)
(RUF005)
170-173: Consider (BatchOffset, BOffset, *BMatrixOffsets) instead of concatenation
Replace with (BatchOffset, BOffset, *BMatrixOffsets)
(RUF005)
171-171: Undefined name BatchOffset
(F821)
172-172: Undefined name BOffset
(F821)
174-177: Consider (BatchSize, BSize, *BMatrixShape) instead of concatenation
Replace with (BatchSize, BSize, *BMatrixShape)
(RUF005)
176-176: Undefined name BSize
(F821)
370-370: Consider (BatchOffset, *AMatrixOffsets) instead of concatenation
Replace with (BatchOffset, *AMatrixOffsets)
(RUF005)
370-370: Undefined name BatchOffset
(F821)
370-370: Consider (BatchSize, *AMatrixShape) instead of concatenation
Replace with (BatchSize, *AMatrixShape)
(RUF005)
373-376: Consider (BatchOffset, BOffset, *AMatrixOffsets) instead of concatenation
Replace with (BatchOffset, BOffset, *AMatrixOffsets)
(RUF005)
374-374: Undefined name BatchOffset
(F821)
375-375: Undefined name BOffset
(F821)
377-380: Consider (BatchSize, BSize, *AMatrixShape) instead of concatenation
Replace with (BatchSize, BSize, *AMatrixShape)
(RUF005)
379-379: Undefined name BSize
(F821)
386-386: Consider (BatchOffset, *BMatrixOffsets) instead of concatenation
Replace with (BatchOffset, *BMatrixOffsets)
(RUF005)
386-386: Undefined name BatchOffset
(F821)
386-386: Consider (BatchSize, *BMatrixShape) instead of concatenation
Replace with (BatchSize, *BMatrixShape)
(RUF005)
389-392: Consider (BatchOffset, BOffset, *BMatrixOffsets) instead of concatenation
Replace with (BatchOffset, BOffset, *BMatrixOffsets)
(RUF005)
390-390: Undefined name BatchOffset
(F821)
391-391: Undefined name BOffset
(F821)
393-396: Consider (BatchSize, BSize, *BMatrixShape) instead of concatenation
Replace with (BatchSize, BSize, *BMatrixShape)
(RUF005)
395-395: Undefined name BSize
(F821)
405-405: Consider (BatchOffset, *CMatrixOffsets) instead of concatenation
Replace with (BatchOffset, *CMatrixOffsets)
(RUF005)
405-405: Undefined name BatchOffset
(F821)
405-405: Consider (BatchSize, *CMatrixShape) instead of concatenation
Replace with (BatchSize, *CMatrixShape)
(RUF005)
407-407: Consider (BatchOffset, BOffset, *CMatrixOffsets) instead of concatenation
Replace with (BatchOffset, BOffset, *CMatrixOffsets)
(RUF005)
407-407: Undefined name BatchOffset
(F821)
407-407: Undefined name BOffset
(F821)
407-407: Consider (BatchSize, BSize, *CMatrixShape) instead of concatenation
Replace with (BatchSize, BSize, *CMatrixShape)
(RUF005)
407-407: Undefined name BSize
(F821)
Deeploy/Targets/Neureka/TileConstraints/NeurekaDepthwiseConstraint.py
259-259: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
259-259: Undefined name inputRequantCubes
(F821)
277-277: Local variable dilation is assigned to but never used
Remove assignment to unused variable dilation
(F841)
314-314: Local variable weightBuffer is assigned to but never used
Remove assignment to unused variable weightBuffer
(F841)
317-317: Local variable inputBatchVar is assigned to but never used
Remove assignment to unused variable inputBatchVar
(F841)
320-320: Local variable inputChannelVar is assigned to but never used
Remove assignment to unused variable inputChannelVar
(F841)
322-322: Local variable outputBatchVar is assigned to but never used
Remove assignment to unused variable outputBatchVar
(F841)
323-323: Local variable outputHeightVar is assigned to but never used
Remove assignment to unused variable outputHeightVar
(F841)
324-324: Local variable outputWidthVar is assigned to but never used
Remove assignment to unused variable outputWidthVar
(F841)
325-325: Local variable outputChannelVar is assigned to but never used
Remove assignment to unused variable outputChannelVar
(F841)
328-328: Local variable padding is assigned to but never used
Remove assignment to unused variable padding
(F841)
408-408: Unpacked variable BatchOffset is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
408-408: Unpacked variable HOffset is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
408-408: Unpacked variable WOffset is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
408-408: Unpacked variable COffset is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
409-409: Unpacked variable BatchSize is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
421-421: Unpacked variable inBSize is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
421-421: Unpacked variable inHSize is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
517-517: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
Deeploy/MemoryLevelExtension/NetworkDeployers/MemoryLevelDeployer.py
131-131: Undefined name ONNXLayer
(F821)
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py
25-25: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
Deeploy/Targets/PULPOpen/Deployer.py
40-40: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
DeeployTest/testMVP.py
58-58: Unused function argument: verbose
(ARG001)
🔇 Additional comments (29)
.github/workflows/ci-platform-siracusa.yml (1)
57-75: Test additions well-integrated and properly organized.The new kernel tests (lines 57–63 for Float2D convolutions with and without DW variants, line 75 for reshape with skip connection) and the model test (line 99 for TinyViT demo) follow established naming conventions, are logically grouped by operation type, and align with the PR's stated objectives to add FP test coverage.
Based on the established pattern (documented in learnings), these test names map to directories in
DeeployTest/Tests/, and the prior review confirmed these directories exist and tests pass successfully.Also applies to: 99-99
CHANGELOG.md (1)
100-103: Approve changelog entries for bug fixes.The fixed section comprehensively documents the four key bug fixes (alias_of parameter handling, non-batched FP GEMM/matmul, closure naming, and FP add/mul data types) in alignment with the PR objectives. The descriptions are clear and concise.
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)
35-35: LGTM!The storage of
n_coresis straightforward and correct.Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)
127-131: Nice performance improvement.The refactored implementation using iterative construction is O(n) instead of O(n²) with repeated
.index()calls, and the logic is clearer.
338-383: New DW convolution passes look well-structured.The
NCHWtoNHWCDwConvPassandPULPFPDWConvPassclasses follow the established pattern-based pass architecture correctly. However, they depend on_PULPDWNCHWtoNHWC_fun, which has a syntax error that must be fixed first.Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)
13-14: LGTM: MIN casts prevent signed/unsigned mixupsCasting ${size} to uint32_t here is correct and avoids UB on MIN comparisons.
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)
100-117: Im2Col call argument order looks correctX/Y, kernel, and stride ordering matches the expected implementation for these templates.
Based on learnings.
21-24: Conv2D Im2Col scratch sizing formula is correctThe formula matches the C kernel's
pContextBufferrequirement. The buffer is correctly sized as(typeWidth // 8) * n_cores * ch_im_in * kx * kybytes, which corresponds to the kernel logic where each core getsch_im_in * kernel_x * kernel_yfloat elements, split acrossn_coreswith pointer arithmetic advancing by element count. For fp32 (typeWidth=32), this yields the exact buffer needed:n_cores * ch_im_in * kx * ky * 4bytes.Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)
164-173: Review comment is incorrect: prior cleanup was not completed.The first BCube construction block (lines 146-160) still exists as dead code. It is never appended to
inputBCubes—only the second BCube constructed at line 172 is appended (line 173). The review comment incorrectly asserts that "earlier deleted branch already tried to build BCube," but this branch was never deleted. While lines 164-173 correctly represent the single active path, the orphaned BCube construction at lines 146-160 contradicts the cleanup claim.Likely an incorrect or invalid review comment.
Deeploy/Targets/PULPOpen/Parsers.py (3)
80-82: FP Conv2D: bias arity check LGTMAllowing 2 or 3 inputs at parse time is correct and aligns with bias support.
105-117: FP Conv2D: bias flags wiring is consistentSetting has_bias ("true"/"false") and mapping bias when present matches existing templates.
Please confirm FloatConvTemplate.reference2DIm2ColTemplate expects:
- keys: data_in, weight, data_out, has_bias, bias.
123-190: Remove suggestion to validate DW early in parseNode; current design is sound.The review comment suggests early depthwise validation in
parseNode, but this is architecturally infeasible. TheparseNodemethod receives only the node (without context), soch_im_in—needed to verifygroup == ch_im_in—is unavailable. Input channel dimensions are only accessible inparseNodeCtxtviactxt.lookup(), which is where the DW check correctly occurs (line ~167).The mention of "F_total % C == 1" references an undefined formula not present anywhere in the codebase.
The current two-stage design (attribute validation in
parseNode, context validation inparseNodeCtxt) correctly reflects the available information at each stage and is the proper approach.Likely an incorrect or invalid review comment.
Deeploy/Targets/PULPOpen/Bindings.py (5)
12-19: Import expansion LGTMFloat data types and new templates are correctly imported for the added FP bindings.
151-152: DMA Slice: template swap LGTMSwitch to DMASliceTemplate.referenceTemplate is consistent with DMA path.
163-172: Slice bindings for FP types: OKSliceChecker + SliceTemplate with ForkTransformer over FloatDataTypes matches the Float path.
299-303: Float ReduceMean bindings: OKMixed float + integer axes signature bound to FloatReduceMeanTemplate with ClusterTransformer is consistent with parser behavior for FP opset.
239-245: FP DW Conv2D binding confirmed: template keys match parserVerified that
PULPFPDWConv2DParser.parseNodeCtxt()produces bothhas_biasandbiaskeys inoperatorRepresentation(Parsers.py:175–176, 179), which are correctly expected byFloatConvTemplate.referenceDW2DIm2ColTemplate(FloatConvTemplate.py:81, 110, 142). The binding is correct.Deeploy/TilingExtension/TilerExtension.py (1)
988-989: Review comment is incorrect — no bug existsNetworkContext.copy() uses
copy.copy(self), which is a shallow copy that preserves instance attributes liken_cores(they remain in the object's__dict__). _convertCtxtToStaticSchedule returns the same ctxt instance modified in-place, son_coresnaturally survives. Line 988's reassignment is defensive programming but not required for correctness.TargetLibraries/PULPOpen/src/Convolution_fp32.c (1)
36-39: Check argument order consistency at call sites.Kernels accept (H, W, P, Q, SP, SQ); FloatConvTemplate uses X,Y ordering per prior notes. Please confirm bindings pass the correct order to avoid transposed dims.
Based on learnings
Also applies to: 134-138
Deeploy/DeeployTypes.py (1)
1205-1210: Constructor argument mismatch in parseOutputs.Use aliases=[] instead of alias_of=[] to match VariableBuffer.init.
Apply:
- nb = ctxt.VariableBuffer( - name = name, - shape = node.shape, - alias_of = [], - ) + nb = ctxt.VariableBuffer( + name = name, + shape = node.shape, + aliases = [], + )Likely an incorrect or invalid review comment.
DeeployTest/testMVP.py (3)
84-92: LGTM!The
n_coresparameter is correctly propagated tomapDeployer, and the formatting is clean and consistent.
201-205: LGTM!The CLI argument is well-defined with a sensible default and clear help text. The parameter correctly flows through to the deployment pipeline.
256-262: LGTM!The
n_coresparameter is correctly passed from CLI arguments tosetupDeployer, completing the propagation chain through the deployment pipeline.Deeploy/Targets/PULPOpen/Platform.py (6)
8-9: LGTM!All new imports are properly structured and used within the file. The additions support FP depthwise convolution, slice operations, and the new optimization pass.
Also applies to: 32-32, 35-36
77-77: LGTM!The
FPDWConv2DMapperdefinition follows the established pattern and correctly pairs the FP depthwise conv parser with its bindings.
92-94: LGTM!Both slice mappers correctly use
SliceParserwith different bindings to support DMA-based and software-based slice implementations. The mapper defined first in the mapping list (line 134) will have precedence during binding selection.
234-235: LGTM!The
RemoveEmptyConvBiasPassis correctly positioned near the end of the optimization pipeline, after merge and requantization passes. This placement allows the pass to clean up any empty bias parameters that may have been introduced or remain after earlier transformations.
134-134: No action needed; original review comment is incorrect.The mapper order does not affect precedence.
PULPDMASliceBindingsoperates onIntegerDataTypeswhilePULPSliceBindingsoperates onFloatDataTypes—these are mutually exclusive sets. TheSliceCheckertype constraints determine which binding applies, not the array order. The current ordering[SliceMapper, DMASliceMapper]is safe.Likely an incorrect or invalid review comment.
108-108: Based on my verification of the parser constraint logic:FPConv2DMapper matches regular convolutions exclusively:
PULPFPConv2DParser.parseNode()explicitly requiresself.operatorRepresentation['group'] == 1- This constraint rejects any depthwise convolution (where
group == ch_im_in)FPDWConv2DMapper matches depthwise convolutions:
PULPFPDWConv2DParser.parseNode()does not have a group==1 constraint- Depthwise validation occurs in
parseNodeCtxt()at line 186:if self.operatorRepresentation['group'] == self.operatorRepresentation['ch_im_in']The mapper ordering
[FPConv2DMapper, FPDWConv2DMapper]is correct: the more constrained parser (requiringgroup==1) is tried first, ensuring regular convolutions are matched before attempting depthwise matching. Any regular convolution caught by FPConv2DMapper will not reach FPDWConv2DMapper.
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Outdated
Show resolved
Hide resolved
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Outdated
Show resolved
Hide resolved
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Outdated
Show resolved
Hide resolved
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py
Outdated
Show resolved
Hide resolved
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py
Outdated
Show resolved
Hide resolved
98d51f7 to
945b610
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
Deeploy/DeeployTypes.py (2)
241-245: Replace mutable default (shape=[1]) with immutable and type-annotate.Avoid shared mutable defaults.
- def __init__(self, name: str = '', shape = [1], aliases: Optional[List[str]] = None): + def __init__(self, name: str = '', shape: Sequence[int] = (1,), aliases: Optional[List[str]] = None):
379-388: Fix visited set creation in has_live_ancestors (bug).set(self.name) splits into characters; use a singleton set.
- visited = set(self.name) + visited = {self.name}
♻️ Duplicate comments (6)
CHANGELOG.md (1)
23-23: Remove duplicate PR entry (#105).The “Refactor tiling code generation [#105]” line is duplicated. Remove the second occurrence to avoid confusion.
- - Refactor tiling code generation [#105](https://github.com/pulp-platform/Deeploy/pull/105)Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)
25-26: Avoid mutable default for inputOffsets.Use Optional and initialize inside to prevent shared state.
- inputOffsets: Dict[str, int] = {}, + inputOffsets: Optional[Dict[str, int]] = None, n_cores: int = 8): @@ - if inputOffsets == {}: - for key in inputTypes.keys(): - inputOffsets[key] = 0 + if inputOffsets is None: + inputOffsets = {key: 0 for key in inputTypes.keys()}Deeploy/Targets/PULPOpen/Deployer.py (1)
40-53: Avoid mutable default for inputOffsets; forward Optional to base.Align with SignPropDeployer’s fix.
-from typing import Callable, Dict, List, Type +from typing import Callable, Dict, List, Optional, Type @@ - inputOffsets = {}, + inputOffsets: Optional[Dict[str, int]] = None, n_cores: int = 8): @@ - inputOffsets = inputOffsets, + inputOffsets = inputOffsets, n_cores = n_cores,Deeploy/DeeployTypes.py (3)
325-329: Constructor arg mismatch: use ‘aliases’, not ‘alias_of’.This raises a TypeError at runtime.
- return (cls( - name = node.name, - shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape, - alias_of = [], - )) + return cls( + name = node.name, + shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape, + aliases = [], + )
442-447: Avoid mutable defaults in ConstantBuffer.init.Use immutable defaults and convert.
- def __init__(self, name: str = '', shape = [1], values = [0]): + def __init__(self, name: str = '', shape: Sequence[int] = (1,), values: Union[Sequence, np.ndarray] = (0,)): # Pass a copy of alias_of to avoid shared references super().__init__(name, shape) - values = np.asarray(values) + values = np.asarray(values)
331-365: Align alias helpers with self.aliases (set) and remove alias_of usage.Current code references a non-existent attribute and never updates aliases.
- def add_aliases(self, aliases_to_add: List[str]): + def add_aliases(self, aliases_to_add: List[str]): @@ - if not hasattr(self, "alias_of"): - return None - - for alias in aliases_to_add: - if alias not in self.alias_of: - self.alias_of.append(alias) + if not hasattr(self, "aliases") or self.aliases is None: + self.aliases = set() + for alias in aliases_to_add: + self.aliases.add(alias) return None @@ - def get_aliases_of(self): + def get_aliases_of(self) -> List[str]: @@ - if hasattr(self, "alias_of"): - return list(self.alias_of) - else: - return list() + return list(self.aliases) if hasattr(self, "aliases") and self.aliases is not None else []
🧹 Nitpick comments (14)
Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1)
158-159: Consider defensive handling for edge cases in name prefixing.The underscore prefix addresses names starting with numbers as intended. However, the implementation doesn't guard against:
- Empty or None
namevalues- Names already starting with
_(which would produce__, a reserved identifier)Consider adding a check:
- # Prepend underscore to avoid name issues when beginning with problematic characters (like numbers) - self.closureName = "_" + name + self.closureSuffix + # Prepend underscore to avoid name issues when beginning with problematic characters (like numbers) + # Avoid creating double-underscore if name already starts with underscore + prefix = "" if name and name[0] == "_" else "_" + self.closureName = prefix + name + self.closureSuffixDeeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1)
11-21: LGTM! Correct fix for non-batched input handling.The conditional logic properly addresses the bug by only advancing input pointers when the corresponding input is actually batched. When
A_batchedorB_batchedis false, the base pointer is reused across all batch iterations, which correctly implements broadcasting semantics.The pointer arithmetic for the batched case is correct:
- A offset:
b * M * Nfor shape[batch, M, N]- B offset:
b * N * Ofor shape[batch, N, O]Minor: Line 17 has trailing whitespace after the colon that could be cleaned up.
- % if B_batched: + % if B_batched:Deeploy/Targets/Generic/Parsers.py (1)
1071-1074: Consider addingstrict=Truetozip()calls for defensive programming.The
zip()calls iterate over inputs and outputs with expected fixed-length lists. WhileparseNodealready validates these lengths, addingstrict=True(Python 3.10+) provides an explicit safety guarantee and catches bugs if the validation logic changes.Apply this diff if Python 3.10+ is used:
- for tensor, symName in zip(node.inputs, ['data_in', 'shape']): + for tensor, symName in zip(node.inputs, ['data_in', 'shape'], strict=True): self.operatorRepresentation[symName] = ctxt.lookup(tensor.name).name - for tensor, symName in zip(node.outputs, ['data_out']): + for tensor, symName in zip(node.outputs, ['data_out'], strict=True): self.operatorRepresentation[symName] = ctxt.lookup(tensor.name).nameBased on static analysis hints.
Deeploy/TilingExtension/TilerExtension.py (1)
988-989: Propagate n_cores earlier (and guard against missing attribute).Set ctxt.n_cores when the context is created (or before tiling) to avoid surprises, and guard for attribute presence to prevent AttributeError if wrappers differ.
Example:
- self.ctxt = self.tiler._convertCtxtToStaticSchedule(self.ctxt, memoryMap) - self.ctxt.n_cores = self.n_cores + self.ctxt = self.tiler._convertCtxtToStaticSchedule(self.ctxt, memoryMap) + if hasattr(self, "n_cores"): + self.ctxt.n_cores = self.n_coresDeeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)
35-35: Drop redundant self.n_cores assignment.NetworkDeployer already stores n_cores; remove this duplicate to keep a single source of truth.
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (4)
43-51: Fix return type annotation to match base (Union[int, IntVar]).The function returns an int, but is annotated as str. Align with NodeTemplate’s signature.
- def computeTransientBuffersSize(ctxt: NetworkContext, - operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]: + def computeTransientBuffersSize( + ctxt: NetworkContext, + operatorRepresentation: OperatorRepresentation, + ) -> List[Tuple[str, Union[int, IntVar]]]:
46-51: Update stale comment about using size as string.The returned size is numeric and also passed to hoistTransientBuffer; the “used as string” warning is misleading. Remove or clarify it.
59-65: Prefer annotateType over mutating referencedType.Setting
_type.referencedTypemutates the class and can have side effects. Use annotateType to set a precise Pointer type and instance.- # Manually set the type of the im2col buffer to match the input type, since it defaults to void for transient buffers - ctxt.lookup(im2col_name)._type.referencedType = ctxt.lookup( - operatorRepresentation['data_in'])._type.referencedType + # Set the buffer pointer type to match data_in + ctxt.annotateType(im2col_name, ctxt.lookup(operatorRepresentation['data_in'])._type)
93-101: Nit: fix comment typos (“ChannelOout” → “ChannelOut”).Purely cosmetic; improves readability.
Also applies to: 124-131
Deeploy/DeeployTypes.py (1)
573-576: Avoid mutable dict defaults in NetworkContext.init.Initialize dicts inside to prevent shared state.
- globalObjects = {}, - localObjects = {}, + globalObjects: Optional[OrderedDict] = None, + localObjects: Optional[OrderedDict] = None, @@ - self.globalObjects = OrderedDict() - self.localObjects = OrderedDict() + self.globalObjects = OrderedDict() if globalObjects is None else globalObjects + self.localObjects = OrderedDict() if localObjects is None else localObjectsDeeployTest/testUtils/platformMapping.py (2)
92-104: Use int for n_cores (not Optional) since default is numeric.- n_cores: Optional[int] = 8, + n_cores: int = 8,
114-188: Propagate n_cores to all SignProp-based deployers for consistency.CMSIS, MemPool, SoftHier, Generic deployers inherit SignPropDeployer and accept n_cores. Pass it through so CLI behavior is uniform.
Example for each block:
- deployer = CMSISDeployer(..., - deeployStateDir = deeployStateDir, - inputOffsets = inputOffsets) + deployer = CMSISDeployer(..., + deeployStateDir = deeployStateDir, + inputOffsets = inputOffsets, + n_cores = n_cores)Apply similarly to MemPoolDeployer, SoftHierDeployer, GenericDeployer.
Also applies to: 140-188
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
50-148: Code duplication acceptable for performance.The has_bias and no-bias paths are duplicated (lines 50-148 vs 149-248), which increases maintainability burden but avoids branching inside tight loops. This trade-off is reasonable for performance-critical embedded kernels.
Consider adding a comment noting this is intentional for performance if future maintainers question the duplication.
TargetLibraries/PULPOpen/src/Convolution_fp32.c (1)
41-101: Code duplication acceptable for performance.The has_bias and no-bias code paths are duplicated to avoid conditional branching inside the innermost loops (lines 47-64). This is a common optimization in performance-critical embedded code where avoiding branches can significantly improve execution speed.
Similar duplication exists in the im2col path (lines 140-218). Consider adding a brief comment explaining this trade-off for future maintainers.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (24)
.github/workflows/ci-platform-siracusa.yml(3 hunks)CHANGELOG.md(4 hunks)Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py(1 hunks)Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py(2 hunks)Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py(1 hunks)Deeploy/DeeployTypes.py(12 hunks)Deeploy/Targets/Generic/Parsers.py(1 hunks)Deeploy/Targets/PULPOpen/Bindings.py(6 hunks)Deeploy/Targets/PULPOpen/Deployer.py(1 hunks)Deeploy/Targets/PULPOpen/Parsers.py(2 hunks)Deeploy/Targets/PULPOpen/Platform.py(7 hunks)Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py(5 hunks)Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py(1 hunks)Deeploy/TilingExtension/TilerExtension.py(1 hunks)DeeployTest/testMVP.py(4 hunks)DeeployTest/testRunner_tiled_siracusa.py(1 hunks)DeeployTest/testUtils/platformMapping.py(2 hunks)DeeployTest/testUtils/testRunner.py(3 hunks)TargetLibraries/PULPOpen/inc/kernel/Conv.h(1 hunks)TargetLibraries/PULPOpen/src/Convolution_fp32.c(4 hunks)TargetLibraries/PULPOpen/src/DWConvolution_fp32.c(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py
- Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
- Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py
- DeeployTest/testUtils/testRunner.py
- Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py
- DeeployTest/testRunner_tiled_siracusa.py
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-24T12:53:00.168Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: TargetLibraries/PULPOpen/src/DWConvolution_fp32.c:0-0
Timestamp: 2025-09-24T12:53:00.168Z
Learning: In PULP platform code, log2 is defined in pulp_nn_utils and can be used safely without requiring math.h or causing floating-point build/runtime issues, unlike the standard library log2 function.
Applied to files:
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
Applied to files:
TargetLibraries/PULPOpen/src/DWConvolution_fp32.cDeeploy/Targets/PULPOpen/Templates/FloatConvTemplate.pyTargetLibraries/PULPOpen/inc/kernel/Conv.hTargetLibraries/PULPOpen/src/Convolution_fp32.c
📚 Learning: 2025-09-24T11:43:47.236Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.
Applied to files:
.github/workflows/ci-platform-siracusa.yml
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.
Applied to files:
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (9)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (2)
inputs(2569-2586)lookup(778-810)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
inputs(2569-2586)lookup(778-810)parseNode(1091-1106)parseNodeCtxt(1109-1134)NetworkContext(564-1078)Deeploy/Targets/Generic/Parsers.py (31)
Conv2DParser(1220-1286)parseNode(19-27)parseNode(49-58)parseNode(81-94)parseNode(102-112)parseNode(163-169)parseNode(193-207)parseNode(229-243)parseNode(271-293)parseNode(333-351)parseNode(373-385)parseNode(422-434)parseNode(469-473)parseNode(496-507)parseNode(531-542)parseNode(578-582)parseNodeCtxt(29-41)parseNodeCtxt(60-76)parseNodeCtxt(114-155)parseNodeCtxt(171-185)parseNodeCtxt(209-221)parseNodeCtxt(245-263)parseNodeCtxt(295-325)parseNodeCtxt(353-365)parseNodeCtxt(387-414)parseNodeCtxt(436-461)parseNodeCtxt(475-488)parseNodeCtxt(509-523)parseNodeCtxt(544-570)parseNodeCtxt(584-590)parseNodeCtxt(603-620)
DeeployTest/testMVP.py (3)
Deeploy/DeeployTypes.py (1)
NetworkDeployer(3244-3629)DeeployTest/testUtils/platformMapping.py (1)
mapDeployer(92-261)DeeployTest/testSchedulingExtension.py (1)
_mockScheduler(29-33)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
NodeTemplate(87-229)computeTransientBuffersSize(142-162)NetworkContext(564-1078)hoistTransientBuffers(165-185)hoistTransientBuffer(879-898)lookup(778-810)
Deeploy/Targets/PULPOpen/Bindings.py (4)
Deeploy/CommonExtensions/DataTypes.py (4)
float32_t(74-78)int8_t(12-15)int32_t(24-27)uint8_t(36-39)Deeploy/DeeployTypes.py (3)
CodeTransformation(2352-2386)NodeBinding(1574-1719)NodeTemplate(87-229)Deeploy/Targets/Generic/TypeCheckers.py (2)
SliceChecker(37-51)ReduceMeanChecker(310-324)Deeploy/AbstractDataTypes.py (1)
PointerClass(536-559)
DeeployTest/testUtils/platformMapping.py (2)
Deeploy/DeeployTypes.py (3)
DeploymentPlatform(2439-2482)TopologyOptimizer(2237-2266)NetworkDeployer(3244-3629)Deeploy/Targets/PULPOpen/Deployer.py (1)
PULPDeployer(29-127)
Deeploy/DeeployTypes.py (2)
DeeployTest/testUtils/dmaUtils.py (1)
size(72-73)Deeploy/AbstractDataTypes.py (1)
Pointer(303-388)
TargetLibraries/PULPOpen/inc/kernel/Conv.h (2)
TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
PULP_Conv2d_fp32_fp32_fp32_HWC(10-102)PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC(104-219)TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC(10-251)
Deeploy/Targets/PULPOpen/Platform.py (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
RemoveEmptyConvBiasPass(519-524)Deeploy/Targets/PULPOpen/Parsers.py (1)
PULPFPDWConv2DParser(124-190)Deeploy/DeeployTypes.py (1)
NodeMapper(1722-1878)Deeploy/Targets/Generic/Parsers.py (1)
SliceParser(97-155)
🪛 LanguageTool
CHANGELOG.md
[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 Ruff (0.14.1)
Deeploy/Targets/Generic/Parsers.py
1071-1071: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
1073-1073: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
DeeployTest/testMVP.py
58-58: Unused function argument: verbose
(ARG001)
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py
25-25: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
Deeploy/Targets/PULPOpen/Deployer.py
40-40: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
🔇 Additional comments (18)
.github/workflows/ci-platform-siracusa.yml (1)
57-64: CI test coverage additions align well with PR features.The new tests for FP 2D convolution (standard and depthwise variants with/without bias), reshape with skip connection, and TinyViT demo align cleanly with the PR's feature additions. Test naming is consistent with existing patterns, and per the earlier review resolution, these tests exist as directories in
DeeployTest/Tests/. The additions integrate smoothly into the existing kernel and model test lists without logic changes or structural issues.Also applies to: 75-75, 99-99
Deeploy/Targets/Generic/Parsers.py (1)
1070-1076: LGTM! Clear and Pythonic mapping implementation.The explicit mapping of inputs to
['data_in', 'shape']and outputs to['data_out']improves clarity and follows the pattern established by other parsers in the codebase (e.g.,iRMSNormParserlines 65-71). The use ofzip()is more Pythonic than index-based access.Deeploy/DeeployTypes.py (2)
879-899: Ensure hoistTransientBuffer size is always int.If any templates pass symbolic strings, they’ll break here. Confirm all computeTransientBuffersSize implementations return int/IntVar as per signature.
2696-2702: Good: preserve n_cores on context reinit.Reconstructing NetworkContext with n_cores=self.ctxt.n_cores avoids losing state on parse.
Deeploy/Targets/PULPOpen/Parsers.py (3)
80-81: LGTM! Bias support correctly integrated.The relaxed input count check now allows optional bias (2 inputs without bias, 3 inputs with bias), which aligns with the bias handling added in parseNodeCtxt.
106-118: LGTM! Clean bias handling implementation.The conditional bias handling correctly maps inputs based on the node's input count, setting appropriate flags for downstream code generation.
124-191: LGTM! Well-structured depthwise convolution parser.The parser correctly validates depthwise convolution constraints:
- Square padding requirement
- Optional bias support (2 or 3 inputs)
- Depthwise verification via
group == ch_im_incheck (line 187)The structure mirrors the regular FP Conv2D parser while adding depthwise-specific validation.
DeeployTest/testMVP.py (1)
57-59: LGTM! Clean n_cores parameter propagation.The n_cores parameter is consistently threaded through the deployment setup:
- Added to setupDeployer signature
- Passed to mapDeployer
- Exposed via CLI argument (default: 8)
- Propagated from parsed arguments
This enables proper sizing of per-core buffers (e.g., im2col buffers) in multi-core PULP deployments.
Also applies to: 84-92, 201-205, 261-262
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (2)
10-36: LGTM! Proper per-core work distribution.The function correctly partitions output channels across cores:
- Computes per-core channel chunks with remainder handling
- Early-exits when no channels assigned to a core
- Adjusts weight and im2col buffer pointers per core
This enables efficient parallel execution on multi-core PULP clusters.
103-145: LGTM! Correct depthwise convolution logic.The input channel iteration (lines 103-104) correctly computes the range based on output channels assigned to the core, accounting for the depthwise constraint where each input channel produces F_total/C output channels (the "group" parameter in ONNX Conv).
The output channel range computation (lines 118-130) properly clamps to the core's assigned range, ensuring correct work distribution.
Deeploy/Targets/PULPOpen/Bindings.py (3)
151-151: LGTM! Clear separation of DMA and non-DMA slice operations.The update clarifies the distinction:
- PULPDMASliceBindings (line 151): Uses DMASliceTemplate for asynchronous DMA-based slicing returning a Future
- PULPSliceBindings (lines 163-172): Uses SliceTemplate for synchronous float slicing with ForkTransformer
This separation enables appropriate code transformation for each use case.
Also applies to: 163-172
239-245: LGTM! Depthwise convolution bindings correctly configured.The new PULPFloatDWConv2DBindings:
- Support all float types via FloatDataTypes iteration
- Use FloatConvTemplate.referenceDW2DIm2ColTemplate (the new DW im2col template)
- Apply ForkTransformer for multi-core execution
This aligns with the new PULPFPDWConv2DParser and enables FP depthwise convolution support.
298-302: LGTM! Extended ReduceMean support for mixed float/integer inputs.The addition supports ReduceMean operations with float data and integer axes/keepdims parameters, using FloatReduceMeanTemplate. This is a natural extension of the existing integer-only ReduceMean bindings.
Deeploy/Targets/PULPOpen/Platform.py (2)
8-9: LGTM! Proper wiring of new depthwise and slice capabilities.The platform changes correctly integrate the new functionality:
- FPDWConv2DMapper (line 77) combines the new parser and bindings for FP depthwise convolution
- Conv layer (line 108) now supports both FPConv2DMapper and FPDWConv2DMapper
- Slice operation (line 134) supports both regular and DMA-based mappers
- All necessary imports added for new parsers, bindings, and optimization passes
This completes the integration stack for FP depthwise convolution on PULP.
Also applies to: 32-32, 35-36, 77-77, 92-94, 108-108, 134-134
234-235: LGTM! RemoveEmptyConvBiasPass addition is appropriate.Adding RemoveEmptyConvBiasPass to the PULPOptimizer makes sense when introducing bias support. This pass likely removes unnecessary bias parameters when the bias tensor is empty or zero, optimizing the generated code.
The placement after PULPAddRequantMergePass is reasonable in the optimization pass sequence.
TargetLibraries/PULPOpen/inc/kernel/Conv.h (1)
12-27: LGTM! Header declarations consistent with implementations.The function signatures correctly extend the existing FP32 convolution API:
- PULP_Conv2d_fp32_fp32_fp32_HWC (lines 12-18): Added pSrcBias and has_bias parameters
- PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC (lines 20-27): Added bias parameters and pContextBuffer
- PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC (lines 29-36): New entry point for depthwise im2col convolution
All declarations match the implementations in Convolution_fp32.c and DWConvolution_fp32.c. Based on learnings.
Also applies to: 29-36
TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
10-34: LGTM! Proper multi-core parallelization added.The updated function now includes:
- Bias parameter support (pSrcBias, has_bias)
- Per-core output channel distribution (lines 18-31)
- Core-specific weight pointer adjustment (line 34)
- Early exit for cores with no assigned channels (lines 29-31)
This enables efficient parallel execution while maintaining correctness. Based on learnings.
66-67: LGTM! Correct per-core output indexing and bias application.The output indexing (lines 66-67, 96-97, 173, 212-214) correctly accounts for:
- Global output index calculation:
(h_out * W_out + w_out) * F_total + ...- Core-specific channel offset:
ch_out_start + for justf(depending on loop variable)- Bias application using the global channel index (line 67:
pSrcBias[f + ch_out_start], line 175:pSrcBias[f])This ensures correct results when work is distributed across multiple cores.
Also applies to: 175-175
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
Deeploy/DeeployTypes.py (2)
241-241: Complete the mutable default fix forshapeparameter.The
aliasesparameter correctly usesNone, butshape = [1]is still a mutable default. This was flagged in past reviews and by static analysis.Apply this diff:
- def __init__(self, name: str = '', shape = [1], aliases: Optional[List[str]] = None): + def __init__(self, name: str = '', shape: Optional[Sequence[int]] = None, aliases: Optional[List[str]] = None): self.name: str = name - self.shape: Sequence[int] = shape + self.shape: Sequence[int] = shape if shape is not None else (1,)As per past review feedback and static analysis (Ruff B006).
437-439: Fix mutable default arguments inConstantBuffer.__init__.Both
shape = [1]andvalues = [0]are mutable defaults, flagged by past reviews and static analysis (Ruff B006).Apply this diff:
- def __init__(self, name: str = '', shape = [1], values = [0]): - super().__init__(name, shape) + def __init__(self, name: str = '', shape: Optional[Sequence[int]] = None, values: Union[Sequence, np.ndarray, None] = None): + super().__init__(name, shape if shape is not None else (1,)) - values = np.asarray(values) + values = np.asarray(values if values is not None else [0])As per past review feedback and static analysis.
♻️ Duplicate comments (2)
CHANGELOG.md (1)
23-23: Remove duplicate PR reference.Line 23 duplicates the PR #105 entry already present at line 18. Remove the redundant entry to maintain changelog clarity.
- Use Docker digests instead of arch-specific tags [#106](https://github.com/pulp-platform/Deeploy/pull/106) - Refactor tiling code generation [#105](https://github.com/pulp-platform/Deeploy/pull/105) - Fix `Unsqueeze` Op. when using ONNX opset 13 or higher (from attribute to input) [#119](https://github.com/pulp-platform/Deeploy/pull/119)Deeploy/DeeployTypes.py (1)
331-350: Critical bug: cannot append to a set.Line 260 initializes
self.aliasesas aset, but line 348 calls.append()which doesn't exist on sets. This will raise anAttributeErrorat runtime.Apply this diff:
def add_aliases(self, aliases_to_add: List[str]): """ Adds list of aliases to the aliases attribute. Parameters ---------- alias_to_add : List[str] List of names of aliases to add to the aliases attribute. Returns ------- None """ if not hasattr(self, "aliases"): return None for alias in aliases_to_add: - if alias not in self.aliases: - self.aliases.append(alias) + self.aliases.add(alias) return None
🧹 Nitpick comments (1)
CHANGELOG.md (1)
30-32: Reduce successive FP prefix repetition.Three consecutive entries (lines 30–32) begin with "FP", which reduces readability. Consider rewording one or more entries to vary the opening phrase.
Example refactors:
- Line 30: "Conv 2D (simple & DW), reshape & skip connection, and TinyViT demo tests..."
- Line 31: "Bindings and mappings for PULP slice, DW conv 2D, and reduce mean operations."
- Line 32: "DW conv lowering optimization pass (FP variant) similar to the existent one for integer version."
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CHANGELOG.md(4 hunks)Deeploy/DeeployTypes.py(9 hunks)
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md
[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (114)
- GitHub Check: deeploy-memory-allocation
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: deeploy-memory-allocation
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: deeploy-memory-allocation
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: deeploy-memory-allocation
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: deeploy-memory-allocation
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: deeploy-memory-allocation
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: deeploy-memory-allocation
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: deeploy-memory-allocation
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: deeploy-memory-allocation
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-kernels / test-runner-siracusa
- GitHub Check: deeploy-memory-allocation
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-kernels / test-runner-siracusa
🔇 Additional comments (4)
Deeploy/DeeployTypes.py (4)
558-567: LGTM: n_cores parameter added correctly.The
n_coresparameter is properly added toNetworkContext.__init__with a default value of 8 and stored as an instance variable. No duplicate assignment.
2479-2519: LGTM: n_cores propagated correctly to NetworkContext.The
n_coresparameter is properly added toNetworkContainer.__init__and correctly passed through to theNetworkContextconstructor at lines 2518-2519.
2679-2685: LGTM: NetworkContext recreation preserves n_cores.The new
NetworkContextis correctly created withn_cores = self.ctxt.n_cores, preserving the core count from the existing context.
3231-3285: LGTM: n_cores propagated through NetworkDeployer.The
n_coresparameter is properly added toNetworkDeployer.__init__, passed to the parentNetworkContainerviasuper().__init__(), and stored as an instance variable at line 3285.
lukamac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I reviewed the whole PR. There are still some changes I would like to see specifically around the n_cores and some remnant changes around reshape aliasing. I left some comments also in the older threads so check those too. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (1)
42-48: Consider removing redundant buffer retrieval.The parent class
_GenericReshapeTemplate.alignToContext()already retrieves and validates these buffers (lines 29-33 in Generic template). Since you callsuper().alignToContext()on line 40, you can retrieve the buffers directly from the returnedctxtwithout re-fetching them.Apply this diff to streamline the code:
- # Get buffers - bufferIn = ctxt.lookup(operatorRepresentation['data_in']) - assert isinstance(bufferIn, VariableBuffer) - - bufferOut = ctxt.lookup(operatorRepresentation['data_out']) - assert isinstance(bufferOut, VariableBuffer) - + # Get buffers (already retrieved and validated by parent) + bufferIn = ctxt.lookup(operatorRepresentation['data_in']) + bufferOut = ctxt.lookup(operatorRepresentation['data_out']) +Or even more concisely, inline them:
- # Get buffers - bufferIn = ctxt.lookup(operatorRepresentation['data_in']) - assert isinstance(bufferIn, VariableBuffer) - - bufferOut = ctxt.lookup(operatorRepresentation['data_out']) - assert isinstance(bufferOut, VariableBuffer) - # Linking required for tiling - bufferOut._alias = bufferIn.name + ctxt.lookup(operatorRepresentation['data_out'])._alias = ctxt.lookup(operatorRepresentation['data_in']).name
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
Deeploy/Targets/Generic/Templates/ReshapeTemplate.py(3 hunks)Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
🧬 Code graph analysis (2)
Deeploy/Targets/Generic/Templates/ReshapeTemplate.py (1)
Deeploy/DeeployTypes.py (3)
NodeTemplate(87-229)lookup(722-754)VariableBuffer(232-360)
Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (2)
Deeploy/DeeployTypes.py (4)
NetworkContext(508-1022)VariableBuffer(232-360)alignToContext(119-139)lookup(722-754)Deeploy/Targets/Generic/Templates/ReshapeTemplate.py (2)
_GenericReshapeTemplate(10-39)alignToContext(15-39)
🔇 Additional comments (3)
Deeploy/Targets/Generic/Templates/ReshapeTemplate.py (1)
10-10: LGTM! Clean refactoring to clarify template hierarchy.The rename from
_ReshapeTemplateto_GenericReshapeTemplateimproves clarity and aligns well with the inheritance pattern where the PULP-specific template now extends this base class.Also applies to: 42-42
Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (2)
28-32: LGTM! Inheritance approach correctly implemented.The addition of the
VariableBufferimport and inheritance from_GenericReshapeTemplateproperly implements the agreed-upon design from the past review discussion.
49-50: LGTM! Tiler-specific aliasing correctly preserved.The
_aliasattribute assignment is correctly preserved for tiler compatibility, as discussed in the past review comments. This works alongside the parent's bidirectionalaliasesset mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
Deeploy/DeeployTypes.py(3 hunks)Deeploy/Targets/PULPOpen/Deployer.py(1 hunks)Deeploy/Targets/PULPOpen/Parsers.py(2 hunks)Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py(5 hunks)DeeployTest/generateNetwork.py(2 hunks)DeeployTest/testMVP.py(5 hunks)DeeployTest/testRunner_siracusa.py(1 hunks)DeeployTest/testUtils/testRunner.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- DeeployTest/testUtils/testRunner.py
- Deeploy/Targets/PULPOpen/Parsers.py
- Deeploy/Targets/PULPOpen/Deployer.py
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
Applied to files:
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.
Applied to files:
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-09T15:58:06.454Z
Learnt from: Xeratec
PR: pulp-platform/Deeploy#105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The _legalizeTransfers function in TilingCodeGeneration.py converts the innermost dimension of transfer rectangles from elements to bytes using a sizeInBytes helper function, ensuring DMA implementations receive properly byte-converted transfer parameters.
Applied to files:
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (3)
DeeployTest/testRunner_siracusa.py (1)
DeeployTest/testUtils/testRunner.py (1)
TestRunner(279-448)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
NodeTemplate(87-229)computeTransientBuffersSize(142-162)NetworkContext(508-1020)hoistTransientBuffers(165-185)hoistTransientBuffer(821-840)lookup(720-752)
DeeployTest/testMVP.py (1)
Deeploy/Targets/PULPOpen/Platform.py (1)
PULPPlatform(251-259)
🪛 Ruff (0.14.2)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
44-44: Unused static method argument: ctxt
(ARG004)
DeeployTest/testMVP.py
59-59: Unused function argument: verbose
(ARG001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (84)
- GitHub Check: deeploy-typing
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: deeploy-typing
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: deeploy-typing
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: deeploy-typing
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: deeploy-typing
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: deeploy-typing
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: deeploy-typing
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: deeploy-typing
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: deeploy-typing
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: deeploy-typing
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
🔇 Additional comments (4)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (4)
21-23: LGTM: Buffer sizing now accounts for runtime core count.The dynamic
n_coresfactor correctly sizes the Im2Col buffer for parallel execution across cores.
47-50: Correct: DW conv buffer omitsch_im_infactor.Unlike regular convolution, depthwise convolution processes each channel independently, so the Im2Col buffer is correctly sized per-core with kernel dimensions only, without the
ch_im_infactor.
61-62: Clarify: Why set buffer type if allocation size is already in bytes?The code explicitly sets the buffer's
referencedType, but according to earlier comments, "Buffer variables don't support type assignment and require specifying the total size." Sinceim2col_dimalready includes byte-size calculation (line 48-50: typeWidth // 8), what purpose does this type assignment serve?If it's for C codegen or downstream type checking, consider adding a comment explaining the intent.
82-82: LGTM: Templates correctly updated with bias support and proper argument ordering.The additions are consistent:
- Bias parameters (
${bias}, ${has_bias}) added to all convolution calls- Im2Col templates correctly pass
${ctxtBuffer}for scratch space- Argument ordering (X,Y for dimensions/kernels/strides) matches the kernel prototypes
Based on learnings.
Also applies to: 101-123, 125-155
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
CHANGELOG.md (1)
27-33: Consider rewording consecutive "FP" entries for stylistic improvement.Lines 30–32 in the Added section begin with "FP," which creates successive repetition. While this is minor in a technical changelog, consider rewording one or two entries for better readability without losing clarity.
Example rewording (optional):
- Line 30: Keep as-is or rephrase to emphasize the TinyViT contribution (e.g., "Conv 2D (simple & DW) tests, reshape & skip connection tests, and TinyViT demo tests for...")
- Line 31: Rephrase to reduce "FP" lead (e.g., "Slice, DW conv 2D, and reduce mean bindings and mappings in floating-point...")
- Line 32: Keep as-is since it emphasizes the parallel with the integer version
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
CHANGELOG.md(4 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
Applied to files:
CHANGELOG.md
🪛 LanguageTool
CHANGELOG.md
[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🔇 Additional comments (1)
CHANGELOG.md (1)
7-7: PR entry added correctly.The addition of PR #117 to the List of Pull Requests is properly formatted and positioned at the top, matching the project's convention.
…atorRepresentations
…sa flags and change the engine n_cores
dd40418 to
fe40743
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
Deeploy/Targets/PULPOpen/Parsers.py (1)
124-153: Depthwise FP parser: enforce multiplier == 1Currently DW detection checks group == ch_im_in, but doesn’t validate ch_im_out == ch_im_in. If multiplier ≠ 1 is provided, the FP DW kernel will likely miscompute.
Add a fast-fail in parseNodeCtxt after shapes are known.
# Check if DW - if self.operatorRepresentation['group'] == self.operatorRepresentation['ch_im_in']: - return newCtxt, True + if self.operatorRepresentation['group'] == self.operatorRepresentation['ch_im_in']: + if self.operatorRepresentation['ch_im_out'] != self.operatorRepresentation['ch_im_in']: + return ctxt, False # depthwise multiplier != 1 not supported + return newCtxt, TrueAlso applies to: 155-182
🧹 Nitpick comments (6)
DeeployTest/generateNetwork.py (1)
87-90: Validate --cores as power-of-two to match kernel assumptionsMany kernels compute chunks with (NUM_CORES-1) masks and log2(NUM_CORES). Non–power-of-two values will break these assumptions.
- Enforce cores > 0 and power-of-two at CLI parse time.
- Clarify help text accordingly.
parser.add_argument( "--cores", type = int, - default = 1, - help = - "Number of cores on which the network is run. Currently, required for im2col buffer sizing on Siracusa. Default: 1.", + default = 1, + help = "Number of cluster cores (power-of-two). Used for im2col buffer sizing on Siracusa. Default: 1.", )Add validation after parsing:
+ if args.cores <= 0 or (args.cores & (args.cores - 1)) != 0: + raise ValueError(f"--cores must be a positive power-of-two, got {args.cores}")Also applies to: 191-196
Deeploy/Targets/PULPOpen/Deployer.py (1)
67-74: Remove unused variable to fix F841 and reduce noiseautoTransposePass is created but never used.
- autoTransposePass = AutoTransposeMergePass() - #self.ctxt, self.layerBinding = autoTransposePass.apply(self.ctxt, self.graph, self.layerBinding) + # autoTransposePass intentionally kept disabled; delete if not neededAlternatively, delete both lines if the pass is not planned.
Also applies to: 81-83
Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1)
9-16: Prefer LOG2(NUM_CORES) over log2(NUM_CORES) for integer core countsUsing the macro avoids double math and header requirements, and matches other kernels.
-uint8_t ${nodeName}_log2Core = (uint8_t) log2(NUM_CORES); +uint8_t ${nodeName}_log2Core = (uint8_t) LOG2(NUM_CORES);CHANGELOG.md (1)
33-34: Micro copy edit“existent” → “existing”.
- FP PULP DW conv lowering optimization pass similar to the existent one for integer version. + FP PULP DW conv lowering optimization pass similar to the existing one for the integer version.TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (2)
137-141: Redundant modulo in weight indexing.
im2col_idxalready iterates[0, P*Q)(line 137), soim2col_idx % (P * Q)at line 140 is redundant and always equalsim2col_idx.Apply this diff to simplify:
sum += im2col_buffer[im2col_idx] * - weight_ptr[(f - ch_out_start) * P * Q + im2col_idx % (P * Q)]; + weight_ptr[(f - ch_out_start) * P * Q + im2col_idx];Also applies to line 239 in the no-bias path.
50-248: Significant code duplication between bias and no-bias paths.Lines 50–148 and 149–248 duplicate ~99 lines of padding and convolution logic; only line 144 vs 243 differs (bias addition). While this pattern is common and tests pass, consider extracting the common im2col/convolution logic into a helper or macro to reduce maintenance burden.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (23)
.github/workflows/ci-platform-siracusa.yml(3 hunks)CHANGELOG.md(6 hunks)Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py(1 hunks)Deeploy/CommonExtensions/DataTypes.py(1 hunks)Deeploy/CommonExtensions/NetworkDeployers/NetworkDeployerWrapper.py(2 hunks)Deeploy/DeeployTypes.py(3 hunks)Deeploy/EngineExtension/NetworkDeployers/EngineColoringDeployer.py(2 hunks)Deeploy/Targets/PULPOpen/Bindings.py(5 hunks)Deeploy/Targets/PULPOpen/Deployer.py(2 hunks)Deeploy/Targets/PULPOpen/Parsers.py(2 hunks)Deeploy/Targets/PULPOpen/Platform.py(8 hunks)Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py(5 hunks)Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py(1 hunks)Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py(1 hunks)DeeployTest/generateNetwork.py(3 hunks)DeeployTest/testMVP.py(3 hunks)DeeployTest/testUtils/testRunner.py(1 hunks)TargetLibraries/PULPOpen/inc/kernel/Conv.h(1 hunks)TargetLibraries/PULPOpen/src/Convolution_fp32.c(4 hunks)TargetLibraries/PULPOpen/src/DWConvolution_fp32.c(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- Deeploy/EngineExtension/NetworkDeployers/EngineColoringDeployer.py
- Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py
- Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
- DeeployTest/testUtils/testRunner.py
- DeeployTest/testMVP.py
- Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
📚 Learning: 2025-09-24T11:43:47.236Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.
Applied to files:
.github/workflows/ci-platform-siracusa.yml
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
Applied to files:
TargetLibraries/PULPOpen/src/DWConvolution_fp32.cTargetLibraries/PULPOpen/inc/kernel/Conv.hDeeploy/Targets/PULPOpen/Parsers.pyDeeploy/Targets/PULPOpen/Templates/FloatConvTemplate.pyTargetLibraries/PULPOpen/src/Convolution_fp32.cCHANGELOG.md
📚 Learning: 2025-09-24T12:53:00.168Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: TargetLibraries/PULPOpen/src/DWConvolution_fp32.c:0-0
Timestamp: 2025-09-24T12:53:00.168Z
Learning: In PULP platform code, log2 is defined in pulp_nn_utils and can be used safely without requiring math.h or causing floating-point build/runtime issues, unlike the standard library log2 function.
Applied to files:
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.
Applied to files:
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-09T15:58:06.454Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The _legalizeTransfers function in TilingCodeGeneration.py converts the innermost dimension of transfer rectangles from elements to bytes using a sizeInBytes helper function, ensuring DMA implementations receive properly byte-converted transfer parameters.
Applied to files:
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (10)
TargetLibraries/PULPOpen/inc/kernel/Conv.h (2)
TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
PULP_Conv2d_fp32_fp32_fp32_HWC(10-102)PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC(104-219)TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC(10-251)
Deeploy/Targets/PULPOpen/Deployer.py (3)
Deeploy/Targets/PULPOpen/Platform.py (1)
PULPClusterEngine(245-254)Deeploy/DeeployTypes.py (6)
_selectEngine(2565-2569)bind(1600-1625)bind(1767-1791)bind(2064-2108)bind(2729-2765)hoistGlobalDefinition(842-855)Deeploy/TilingExtension/TilerExtension.py (1)
bind(1002-1008)
Deeploy/DeeployTypes.py (2)
Deeploy/CommonExtensions/NetworkDeployers/NetworkDeployerWrapper.py (1)
_selectEngine(69-70)Deeploy/EngineExtension/NetworkDeployers/EngineColoringDeployer.py (1)
_selectEngine(51-58)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
inputs(2503-2520)lookup(720-752)parseNode(1033-1048)parseNodeCtxt(1051-1076)NetworkContext(508-1020)Deeploy/Targets/Generic/Parsers.py (31)
Conv2DParser(1218-1284)parseNode(19-27)parseNode(49-58)parseNode(81-94)parseNode(102-112)parseNode(163-169)parseNode(193-207)parseNode(229-243)parseNode(271-293)parseNode(333-351)parseNode(373-385)parseNode(422-434)parseNode(469-473)parseNode(496-507)parseNode(531-542)parseNode(578-582)parseNodeCtxt(29-41)parseNodeCtxt(60-76)parseNodeCtxt(114-155)parseNodeCtxt(171-185)parseNodeCtxt(209-221)parseNodeCtxt(245-263)parseNodeCtxt(295-325)parseNodeCtxt(353-365)parseNodeCtxt(387-414)parseNodeCtxt(436-461)parseNodeCtxt(475-488)parseNodeCtxt(509-523)parseNodeCtxt(544-570)parseNodeCtxt(584-590)parseNodeCtxt(603-620)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
NodeTemplate(87-229)computeTransientBuffersSize(142-162)NetworkContext(508-1020)hoistTransientBuffers(165-185)hoistTransientBuffer(821-840)lookup(720-752)
Deeploy/CommonExtensions/DataTypes.py (1)
Deeploy/AbstractDataTypes.py (1)
typeWidth(399-400)
Deeploy/Targets/PULPOpen/Bindings.py (4)
Deeploy/CommonExtensions/DataTypes.py (5)
float32_t(74-78)int8_t(12-15)int32_t(24-27)int64_t(30-33)uint8_t(36-39)Deeploy/DeeployTypes.py (3)
CodeTransformation(2290-2324)NodeBinding(1512-1657)NodeTemplate(87-229)Deeploy/Targets/Generic/TypeCheckers.py (3)
SliceChecker(37-51)ReshapeChecker(122-136)ReduceMeanChecker(310-324)Deeploy/AbstractDataTypes.py (1)
PointerClass(536-559)
DeeployTest/generateNetwork.py (1)
Deeploy/Targets/PULPOpen/Platform.py (2)
PULPClusterEngine(245-254)PULPPlatform(257-265)
Deeploy/CommonExtensions/NetworkDeployers/NetworkDeployerWrapper.py (2)
Deeploy/DeeployTypes.py (3)
DeploymentEngine(2327-2374)NetworkDeployer(3176-3549)_selectEngine(2565-2569)Deeploy/EngineExtension/NetworkDeployers/EngineColoringDeployer.py (1)
_selectEngine(51-58)
Deeploy/Targets/PULPOpen/Platform.py (3)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
RemoveEmptyConvBiasPass(518-523)Deeploy/Targets/PULPOpen/Parsers.py (1)
PULPFPDWConv2DParser(124-182)Deeploy/Targets/Generic/Parsers.py (1)
SliceParser(97-155)
🪛 LanguageTool
CHANGELOG.md
[style] ~33-~33: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 Ruff (0.14.3)
Deeploy/Targets/PULPOpen/Deployer.py
81-81: Local variable autoTransposePass is assigned to but never used
Remove assignment to unused variable autoTransposePass
(F841)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
48-48: Unused static method argument: ctxt
(ARG004)
🔇 Additional comments (24)
.github/workflows/ci-platform-siracusa.yml (2)
38-99: Approved: Test additions follow existing conventions and align with PR objectives.The new tests are well-organized, follow the camelCase naming convention consistent with existing tests, and logically group related functionality (standard conv → DW conv → special ops). The additions map to the newly implemented FP depthwise convolution, bias support, reshape/skip-connection, and TinyViT features mentioned in the PR summary.
57-63: All new test directories are present and properly configured.Verification confirms all eight newly added tests have corresponding directories under
DeeployTest/Tests/. The additions to the CI pipeline are valid and align with the stated PR objectives.Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)
10-14: Consistent unsigned core mathThanks for harmonizing the core ID/start/end calculations to uint32_t with explicit casts. This keeps the MIN/shift arithmetic in the unsigned domain and lines up with the FloatAdd template changes, so we avoid sign-mismatch warnings and latent UB. Looks good.
Deeploy/Targets/PULPOpen/Deployer.py (1)
84-92: n_cores propagation looks goodInjecting n_cores before binding ensures operator representations can size buffers accordingly. LGTM.
Deeploy/DeeployTypes.py (2)
2565-2569: Engine selection helper is clear and minimalReturning DeploymentEngine and deferring to Mapping is a good split. LGTM.
2585-2587: Binding via selected engine Mapping is correctThis aligns with the new _selectEngine contract. LGTM.
Deeploy/CommonExtensions/NetworkDeployers/NetworkDeployerWrapper.py (1)
7-7: Wrapper delegate for _selectEngine is consistentImports and delegation updated as expected. LGTM.
Also applies to: 69-71
Deeploy/Targets/PULPOpen/Parsers.py (1)
70-96: FP Conv2D parser bias handling is cleanSquare padding + optional bias inputs with clear has_bias/bias fields. LGTM.
Also applies to: 102-121
Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (1)
27-39: Output batching asymmetry warrants verification of design intent.The unconditional output pointer bump (line 39) contrasts with conditional A/B/C bumps (lines 27–37). This pattern is consistent across multiple templates (FloatMatMulTemplate, FloatConvTemplate), suggesting intentional design rather than oversight. However, no
data_out_batchedflag exists in the codebase, and the codebase does not reveal where/how the input batching flags are actually computed.Verify:
- Whether output must always differ per batch iteration (correct design), or
- Whether non-batched output accumulation is a valid use case that should be supported (missing guard).
If non-batched output accumulation should be supported, guard the output increment as proposed in the original comment.
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
10-251: LGTM—depthwise Im2Col implementation is well-structured.The per-core channel distribution, depthwise iteration pattern (mapping input channels to output channel ranges), and bias handling are correctly implemented. Past review concerns regarding
log2usage and padding-loop type mixing have been resolved.Based on learnings:
log2is defined inpulp_nn_utilsand safe to use in PULP code.TargetLibraries/PULPOpen/inc/kernel/Conv.h (1)
12-36: LGTM—header declarations align with implementations.The updated signatures for bias-aware convolution and the new depthwise Im2Col function correctly match the implementations in
Convolution_fp32.candDWConvolution_fp32.c. Parameter order and types are consistent across all three function declarations.Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)
21-38: LGTM—dynamic buffer sizing correctly includes all necessary factors.The updated
computeTransientBuffersSizenow accounts forweight_type.typeWidth // 8,n_cores,ch_im_in, and kernel dimensions, providing accurate per-core buffer allocation. The manual type assignment inhoistTransientBuffers(lines 36-37) ensures the allocation system correctly interprets the element count as bytes.Based on learnings: Transient buffer sizing returns element counts, and manual type assignment ensures correct byte allocation.
41-71: LGTM—depthwise Im2Col template correctly omits channel-in factor.The
PULP2DFloatDWConvIm2ColTemplateappropriately computes buffer size asNUM_CORES * P * Q(lines 52-53), omittingch_im_insince depthwise convolution processes one channel at a time. Manual type assignment (lines 66-67) follows the established pattern from the regular conv template.Note: The unused
ctxtparameter at line 48 (flagged by static analysis) is intentional for interface consistency with the parentNodeTemplate.computeTransientBuffersSizesignature.
130-160: LGTM—depthwise Im2Col template correctly invokes DW kernel.The
referenceDW2DIm2ColTemplatecorrectly callsPULP_DW_Conv2d_Im2Col_fp*_HWCwith bias and context buffer parameters. Parameter order matches the kernel signature inConv.handDWConvolution_fp32.c.Based on learnings: X,Y parameter order in PULP Im2Col calls is correct for the implementation.
Deeploy/Targets/PULPOpen/Platform.py (4)
77-77: LGTM—FP depthwise conv mapper correctly wired.
FPDWConv2DMappercorrectly combinesPULPFPDWConv2DParserwithPULPFloatDWConv2DBindingsand is properly registered in theConvlayer mapping at line 108.
92-94: LGTM—slice mappers provide both DMA and non-DMA paths.The new
SliceMapper(usingPULPSliceBindings) provides a non-DMA slice path forFloatDataTypes, complementing the existingDMASliceMapperfor DMA-based slicing. Both are correctly registered in theSlicelayer mapping at line 134.
234-235: LGTM—empty bias removal pass integrated into optimizer.
RemoveEmptyConvBiasPassis appropriately placed after the requant merge passes, ensuring empty bias tensors are cleaned up during topology optimization.
247-254: LGTM—cluster engine now configurable with core count.The
n_coresparameter (default 8) allows per-engine core configuration, which propagates to theoperatorRepresentation['n_cores']used in template buffer sizing. This enables flexible core allocation for multi-cluster scenarios.Deeploy/Targets/PULPOpen/Bindings.py (4)
143-164: LGTM—slice bindings correctly distinguish DMA and non-DMA paths.
PULPDMASliceBindings(line 151) usesDMASliceTemplatefor DMA-based slicing ofIntegerDataTypes, while the newPULPSliceBindings(lines 155-164) usesSliceTemplateforFloatDataTypes. Both use appropriate transformers (MemoryAwareForkTransformerfor DMA futures,ForkTransformerfor direct slicing).
166-169: LGTM—reshape bindings extended for float types and int64 shapes.The updated
PULPReshapeBindingsnow covers bothIntegerDataTypesandFloatDataTypes, and the second pointer correctly usesint64_tto represent shape parameters (matching ONNX's int64 shape convention).
236-242: LGTM—FP depthwise conv bindings correctly reference DW Im2Col template.
PULPFloatDWConv2DBindingscorrectly bindsConvCheckerwith threePointerClass(float_type)inputs (data, weights, bias) toFloatConvTemplate.referenceDW2DIm2ColTemplate, covering allFloatDataTypeswithForkTransformer.
292-300: LGTM—ReduceMean bindings extended for cross-type support.The cross-type bindings (lines 295-300) enable
ReduceMeanwith float input/output and integer axes, complementing the homogeneous integer bindings (lines 293-294). Both useClusterTransformerappropriately.TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
41-101: LGTM—bias-aware direct convolution correctly implemented.The has_bias and no-bias paths are correctly separated, with bias applied at line 67 (
sum + pSrcBias[f + ch_out_start]) and omitted at line 97. Per-core channel distribution and output indexing are consistent.Note: Lines 41–70 and 71–101 duplicate ~30 lines of convolution logic, differing only in bias application. This pattern matches
DWConvolution_fp32.cand is acceptable for clarity, though a helper function could reduce duplication.
140-218: LGTM—bias-aware Im2Col convolution correctly implemented.The Im2Col function correctly fills per-core buffers from
pContextBuffer(line 132) and applies bias at line 175 in the has_bias path, omitting it at line 214 in the no-bias path. Per-core weight pointer computation (line 129) and output indexing (lines 173, 212) are correct.Note: Lines 140–178 and 179–218 duplicate ~39 lines of im2col logic, similar to the pattern in
DWConvolution_fp32.c.
lukamac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks for all the hard work :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
CHANGELOG.md (1)
114-114: Restore missing "Fixed" verb (unresolved from prior review).This entry is still missing the "Fixed" verb, despite being marked as addressed in past commits. The current text is a sentence fragment.
Apply this diff:
- Data types in the PULPOpen FP add and mul templates. + Fixed data types in the PULPOpen FP add and mul templates.
🧹 Nitpick comments (2)
CHANGELOG.md (2)
31-33: Reduce repetitive "FP" prefix across successive entries.Three consecutive entries (lines 31, 32, 33) begin with "FP", creating a repetitive structure that could be streamlined for clarity.
Consider rewording one or more entries to vary the phrasing without losing specificity:
- FP conv 2D (simple & DW), reshape & skip connection, and TinyViT demo tests to the non-tiled Siracusa CI pipeline. - FP bindings and mappings for PULP slice, DW conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass similar to the existent one for integer version. + Conv 2D (simple & DW), reshape & skip connection, and TinyViT demo tests to the non-tiled Siracusa CI pipeline (FP). + Bindings and mappings for PULP slice, DW conv 2D, and reduce mean operations (floating-point). + FP PULP DW conv lowering optimization pass analogous to the integer version.Alternatively, you could group these entries under a subsection to indicate they are related floating-point additions.
109-109: Use "Renamed" instead of "changed" for consistency with changelog conventions.Changelog entries typically use action verbs like "Renamed," "Refactored," "Updated" rather than past-tense descriptions. This improves consistency with entries like line 81 ("rename of permutation functions").
Apply this diff:
- changed `_mapNode` to `_selectEngine` which reduces the responsibility of that function to, as the name states, just engine selection + Renamed `_mapNode` to `_selectEngine` to reduce the responsibility of that function to just engine selection.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
CHANGELOG.md(6 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
Applied to files:
CHANGELOG.md
🪛 LanguageTool
CHANGELOG.md
[style] ~33-~33: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
This PR brings the changes required for a working minimal TinyViT on the Siracusa platform, without tiling.
Added
n_coresparameter to set the number of cores usedn_coreskey to all PULPClusterEngine templates' operatorRepresentationsChanged
Fixed
PR Merge Checklist
develcommit and pointing todevel.CHANGELOG.mdfile has been updated.