This issue reports a potential PyMTL Verilog translation limitation/bug when translating a heterogeneous array of CGRA instances. The generated Verilog appears to collapse per-instance interface array widths to a single uniform width, which causes out-of-range indexing for larger instances.
Reproduction Branch
VectorCGRA at issue287-pymtl-bitwidth-mismatch
Reproduction Steps
cd /path/to/VectorCGRA
# upstream: https://github.com/tancheng/VectorCGRA
git fetch upstream
git checkout -b issue287-pymtl-bitwidth-mismatch upstream/issue287-pymtl-bitwidth-mismatch
mkdir build && cd build
pytest ../multi_cgra/test/MeshMultiCgraTemplateRTL_test.py::test_mesh_multi_hetero_cgra -vs --tb=long --test-verilog --dump-vtb --dump-vcd > ../hetero_cgra.log
After running, the build/ directory contains generated Verilog such as:
MeshMultiCgraTemplateRTL__<hash_id>__pickled.v and hetero_cgra.log contains the detailed failure.
Target Architecture (Heterogeneous Multi-CGRA)
The test architecture is a 2x2 multi-CGRA:
- CGRA0:
2x2
- CGRA1:
3x3
- CGRA2:
2x2
- CGRA3:
2x2
|
multi_cgra_defaults: |
|
rows: 2 |
|
columns: 2 |
|
|
|
cgra_defaults: |
|
rows: 2 |
|
columns: 2 |
|
configMemSize: 16 |
|
|
|
tile_defaults: |
|
num_registers: 16 |
|
fu_types: ["add", "mul", "div", "fadd", "fmul", "fdiv", "logic", "cmp", "sel", "type_conv", "vfmul", "fadd_fadd", "fmul_fadd", "grant", "loop_control", "phi", "constant", "mem", "return", "mem_indexed", "alloca", "shift"] |
|
|
|
cgra_overrides: |
|
- cgra_x: 1 |
|
cgra_y: 0 |
|
rows: 3 |
|
columns: 3 |
Expected Behavior
For each CGRA instance, boundary interfaces should match that instance’s tile shape:
|
if is_multi_cgra: |
|
s.recv_data_on_boundary_north = [RecvIfcRTL(DataType) for _ in range(per_cgra_columns)] |
|
s.send_data_on_boundary_north = [SendIfcRTL(DataType) for _ in range(per_cgra_columns)] |
|
s.recv_data_on_boundary_south = [RecvIfcRTL(DataType) for _ in range(per_cgra_columns)] |
|
s.send_data_on_boundary_south = [SendIfcRTL(DataType) for _ in range(per_cgra_columns)] |
|
s.recv_data_on_boundary_west = [RecvIfcRTL(DataType) for _ in range(per_cgra_rows)] |
|
s.send_data_on_boundary_west = [SendIfcRTL(DataType) for _ in range(per_cgra_rows)] |
|
s.recv_data_on_boundary_east = [RecvIfcRTL(DataType) for _ in range(per_cgra_rows)] |
|
s.send_data_on_boundary_east = [SendIfcRTL(DataType) for _ in range(per_cgra_rows)] |
So in this architecture:
- CGRA0/2/3 boundary width should be
2
- CGRA1 boundary width should be
3
Observed Behavior
However, during translation, the generated Verilog appears to use a uniform width of 2 for these boundary arrays across all CGRAs, including CGRA1.
For example, MeshMultiCgraTemplateRTL__<hash_id>.v Line 34233
logic [0:0] cgra__recv_data_on_boundary_south__val [0:3][0:1];
[0:3] indexes 4 CGRAs
- inner
[0:1] gives width 2 for each
- but CGRA1 requires width 3 (
[0:2])
This leads to out-of-range accesses, e.g. in hetero_cgra.log:
Selection index out of range: 2
E outside 1:0
E : ... In instance MeshMultiCgraTemplateRTL___05Ff665d7f96c8c724c
E 15206 | assign cgra__recv_data_on_boundary_south__val[1][2] = 1'd0;
E | ^
Current Workaround/Solution
A practical workaround is to pad every CGRA boundary interface to the maximum CGRA dimensions (max_tile_rows, max_tile_cols) and ground unused ports.
This works functionally, but wastes wires/ports for smaller CGRAs.
I think the current solution is okay (modifying the logic of PyMTL's Verilog translation would require too much effort, I guess). Maybe we'll need to pay attention to this issue/case when instantiating heterogeneous multi-CGRAs later.
This issue reports a potential PyMTL Verilog translation limitation/bug when translating a heterogeneous array of CGRA instances. The generated Verilog appears to collapse per-instance interface array widths to a single uniform width, which causes out-of-range indexing for larger instances.
Reproduction Branch
VectorCGRA at issue287-pymtl-bitwidth-mismatch
Reproduction Steps
After running, the
build/directory contains generated Verilog such as:MeshMultiCgraTemplateRTL__<hash_id>__pickled.vandhetero_cgra.logcontains the detailed failure.Target Architecture (Heterogeneous Multi-CGRA)
The test architecture is a
2x2multi-CGRA:2x23x32x22x2VectorCGRA/multi_cgra/test/arch_multi_hetero_cgra_override.yaml
Lines 3 to 20 in 46c1f00
Expected Behavior
For each CGRA instance, boundary interfaces should match that instance’s tile shape:
VectorCGRA/cgra/CgraTemplateRTL.py
Lines 131 to 139 in 46c1f00
So in this architecture:
23Observed Behavior
However, during translation, the generated Verilog appears to use a uniform width of 2 for these boundary arrays across all CGRAs, including CGRA1.
For example, MeshMultiCgraTemplateRTL__<hash_id>.v Line 34233
[0:3]indexes 4 CGRAs[0:1]gives width 2 for each[0:2])This leads to out-of-range accesses, e.g. in
hetero_cgra.log:Current Workaround/Solution
A practical workaround is to pad every CGRA boundary interface to the maximum CGRA dimensions (
max_tile_rows,max_tile_cols) and ground unused ports.This works functionally, but wastes wires/ports for smaller CGRAs.
I think the current solution is okay (modifying the logic of PyMTL's Verilog translation would require too much effort, I guess). Maybe we'll need to pay attention to this issue/case when instantiating heterogeneous multi-CGRAs later.