Skip to content

[P3] Width Mismatch for Heterogeneous Submodule Interface Arrays in the stage of PyMTL Verilog Translation #289

@BenkangPeng

Description

@BenkangPeng

This issue reports a potential PyMTL Verilog translation limitation/bug when translating a heterogeneous array of CGRA instances. The generated Verilog appears to collapse per-instance interface array widths to a single uniform width, which causes out-of-range indexing for larger instances.

Reproduction Branch

VectorCGRA at issue287-pymtl-bitwidth-mismatch

Reproduction Steps

cd /path/to/VectorCGRA
# upstream: https://github.com/tancheng/VectorCGRA
git fetch upstream
git checkout -b issue287-pymtl-bitwidth-mismatch upstream/issue287-pymtl-bitwidth-mismatch
mkdir build && cd build
pytest ../multi_cgra/test/MeshMultiCgraTemplateRTL_test.py::test_mesh_multi_hetero_cgra -vs --tb=long --test-verilog --dump-vtb --dump-vcd > ../hetero_cgra.log

After running, the build/ directory contains generated Verilog such as:

  • MeshMultiCgraTemplateRTL__<hash_id>__pickled.v and hetero_cgra.log contains the detailed failure.

Target Architecture (Heterogeneous Multi-CGRA)

The test architecture is a 2x2 multi-CGRA:

  • CGRA0: 2x2
  • CGRA1: 3x3
  • CGRA2: 2x2
  • CGRA3: 2x2

multi_cgra_defaults:
rows: 2
columns: 2
cgra_defaults:
rows: 2
columns: 2
configMemSize: 16
tile_defaults:
num_registers: 16
fu_types: ["add", "mul", "div", "fadd", "fmul", "fdiv", "logic", "cmp", "sel", "type_conv", "vfmul", "fadd_fadd", "fmul_fadd", "grant", "loop_control", "phi", "constant", "mem", "return", "mem_indexed", "alloca", "shift"]
cgra_overrides:
- cgra_x: 1
cgra_y: 0
rows: 3
columns: 3

Expected Behavior

For each CGRA instance, boundary interfaces should match that instance’s tile shape:

if is_multi_cgra:
s.recv_data_on_boundary_north = [RecvIfcRTL(DataType) for _ in range(per_cgra_columns)]
s.send_data_on_boundary_north = [SendIfcRTL(DataType) for _ in range(per_cgra_columns)]
s.recv_data_on_boundary_south = [RecvIfcRTL(DataType) for _ in range(per_cgra_columns)]
s.send_data_on_boundary_south = [SendIfcRTL(DataType) for _ in range(per_cgra_columns)]
s.recv_data_on_boundary_west = [RecvIfcRTL(DataType) for _ in range(per_cgra_rows)]
s.send_data_on_boundary_west = [SendIfcRTL(DataType) for _ in range(per_cgra_rows)]
s.recv_data_on_boundary_east = [RecvIfcRTL(DataType) for _ in range(per_cgra_rows)]
s.send_data_on_boundary_east = [SendIfcRTL(DataType) for _ in range(per_cgra_rows)]

So in this architecture:

  • CGRA0/2/3 boundary width should be 2
  • CGRA1 boundary width should be 3

Observed Behavior

However, during translation, the generated Verilog appears to use a uniform width of 2 for these boundary arrays across all CGRAs, including CGRA1.

For example, MeshMultiCgraTemplateRTL__<hash_id>.v Line 34233

  logic [0:0] cgra__recv_data_on_boundary_south__val [0:3][0:1];
  • [0:3] indexes 4 CGRAs
  • inner [0:1] gives width 2 for each
  • but CGRA1 requires width 3 ([0:2])

This leads to out-of-range accesses, e.g. in hetero_cgra.log:

Selection index out of range: 2
E           outside 1:0
E                         : ... In instance MeshMultiCgraTemplateRTL___05Ff665d7f96c8c724c
E           15206 |   assign cgra__recv_data_on_boundary_south__val[1][2] = 1'd0;
E                 |                                                   ^

Current Workaround/Solution

A practical workaround is to pad every CGRA boundary interface to the maximum CGRA dimensions (max_tile_rows, max_tile_cols) and ground unused ports.

This works functionally, but wastes wires/ports for smaller CGRAs.
I think the current solution is okay (modifying the logic of PyMTL's Verilog translation would require too much effort, I guess). Maybe we'll need to pay attention to this issue/case when instantiating heterogeneous multi-CGRAs later.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions