Skip to content

Conversation

kimiatkh
Copy link
Contributor

@kimiatkh kimiatkh commented Jul 2, 2025

This pull request adds support for quad-port RAM in the Stratix 10 architecture file. It also updates the vqm2blif tool to process quad-port RAMs from VQM files and generate the appropriate operation modes in the BLIF output. Additionally, it makes a small modification to the MLAB primitives in the architecture capture by adding the class=memory attribute, enabling VPR to correctly pack MLABs slices in to MLAB primitives.

@kimiatkh kimiatkh requested a review from vaughnbetz July 2, 2025 02:08
@github-actions github-actions bot added the lang-cpp C/C++ code label Jul 2, 2025
if (ram_info.port_a_output_clock && ram_info.port_b_output_clock) {
reg_outputs = true; //Sequential output
} else if (!ram_info.port_a_output_clock && !ram_info.port_b_output_clock) {
reg_outputs = false; //Comb output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please edit the comments in your code to be consistent the coding style guide:

  1. Add a space after //
  2. Avoid using block comments for implementation-related commnets

//A dual port memory, both port A and B params have been found
} else {
VTR_ASSERT(ram_info.mode == "dual_port" || ram_info.mode == "bidir_dual_port");
} else if (ram_info.mode == "dual_port" || ram_info.mode == "bidir_dual_port"){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing space ){

} else {
//3b) Use the more detailed name, since it was found in the architecture
mode_hash = tmp_mode_hash;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-write this if statement get rid of the branch where nothing is done

@vaughnbetz
Copy link
Contributor

There's a failing regtest, but it may just need a golden result update. The vqm2blif output has changed, but the changes seem like block name changes. Here's a snippet:

VQM2BLIF Program Check
----------------------
ERROR: The generated blif file "ucsb_152_tap_fir_stratixiv.test.blif" did not match the golden reference "ucsb_152_tap_fir_stratixiv_arch_timing.blif".
The differences are shown below:
-------------------------------------------------------------------------------------------------------------------------------------------------------
1c1
< #BLIF OUTPUT: ./utils/vqm2blif/test/scripts/../netlists/ucsb_152_tap_fir/ucsb_152_tap_fir_stratixiv.golden.blif
---
> #BLIF OUTPUT: ./utils/vqm2blif/test/scripts/../netlists/ucsb_152_tap_fir/ucsb_152_tap_fir_stratixiv.test.blif
98c98
<     q=Register26Bit:Reg380|RegOut[0]
---
>     q=Register26Bit:Reg380|__ALT_INV__RegOut[0]
104c104
<     datad=Register26Bit:Reg380|RegOut[0] \
---
>     datad=Register26Bit:Reg380|__ALT_INV__RegOut[0] \
130c130
<     q=Register28Bit:Reg382|RegOut[0]
---
>     q=Register28Bit:Reg382|__ALT_INV__RegOut[0]

Copy link
Contributor

@vaughnbetz vaughnbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One regtest failure and a suggestion on more precise timing modeling (and commenting of quad port behaviour in the arch file).

cout << " port A output sequential: " << bool(ram_info.port_a_output_clock) << "\n";
cout << " port B output sequential: " << bool(ram_info.port_b_output_clock) << "\n";
}
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest a comment of what the else is:
// quad port mode

<port clock="clk_portbin" combinational_sink_ports="eccstatus portadataout portbdataout " name="portbdatain"/>
<port clock="clk_portbin" combinational_sink_ports="eccstatus portadataout portbdataout " name="portbaddr"/>
<port clock="clk_portbin" combinational_sink_ports="eccstatus portadataout portbdataout " name="portbaddr2"/>
<port clock="clk_portbin" combinational_sink_ports="eccstatus portadataout portbdataout " name="portbwe"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worth putting a comment at the top of the quad port section explaining what a quad port S10 RAM is. It is really simple quad port (2 read ports, 2 write ports). So you have 4 address buses, two datain and two dataout.
It looks like when you have combinational outputs (not registered outputs) you assume portaaddr affects both portadataout and portbdata. Same for portaaddr2. We also have the portbaddr and portbaddr2 affecting both portadtaouta and portbdataout. That is overly conservative: we'll only use one address port to choose the read out data. Not sure if it is always the same one or not; vqm2blif could probably always write it on one of the portaaaddr buses for the data a side, and one of the portbaddr buses for the port b side. Then portadataout and portbdataout would only be affected by the one chosen address port each.

Right now portadatain and portbdatain are always affecting the timing of portadataout and portbdataout when the outputs aren't registered. That is also overly conservative; the datain values will be stored in the RAM but (by default at least) Quartus assumes they won't flow through to the outputs the same cycle (that's called read-during-write behaviour -- Quartus has settings to choose which one you assume, but I think we should assume the less conservative ones where dataina does not affect dataouta etc.

Suggest updating the timing section here.

@vaughnbetz
Copy link
Contributor

@kimiatkh : just pinging on this one. I think you have all the arch file changes (sent via email). It would be great to close this item out.

@kimiatkh kimiatkh force-pushed the stratix_10_arch_updates branch from 916b61a to 8bfe793 Compare September 1, 2025 20:21
@kimiatkh
Copy link
Contributor Author

Hi @vaughnbetz,
So sorry for the delay in finishing this. I’ve made all the changes to the vqm2blif tool and the architecture file. However, after running the Titan regression tests, I realized that I need to update the BLIF archive for Titan that gets downloaded into the vtr repo. This is because, after setting the MLAB primitive as a memory class, the primitive definition in the BLIF file now needs to be one bit wide, since Quartus synthesis dumps one-bit memory slices.

I’ve already generated the new netlists and am currently running the regression tests. Once that’s done, I’ll also need to update the golden results, as the MLAB stats have changed.

Just one question: for creating the golden results, is there a stable commit you’d recommend basing my branch on, or is it fine to use the most recent version of master?

@vaughnbetz
Copy link
Contributor

vaughnbetz commented Sep 12, 2025 via email

@vaughnbetz
Copy link
Contributor

Adding @AlexandreSinger as he may be able to help with this, plus documenting the Titanium benchmarks and how to run them etc. We'd like to get this in soon @kimiatkh so we can refer to Titanium in an upcoming paper.

Copy link
Contributor

@vaughnbetz vaughnbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically good to go ... would like the quad port comment just for clarity, but won't hold this up for it.

@AlexandreSinger
Copy link
Contributor

I think this PR is good to merge just to get the architecture fixes into Master. We can work on releasing a new version of Titan after (since these architecture changes work with the old version of Titanium as well). Documentation can come in another PR and should not block this PR in my opinion. Will merge once CI completes.

@vaughnbetz
Copy link
Contributor

Makes sense. Let's land this!

1 similar comment
@vaughnbetz
Copy link
Contributor

Makes sense. Let's land this!

@AlexandreSinger AlexandreSinger merged commit 004339f into master Sep 19, 2025
30 checks passed
@AlexandreSinger AlexandreSinger deleted the stratix_10_arch_updates branch September 19, 2025 15:11
@kimiatkh
Copy link
Contributor Author

@vaughnbetz I've already added the comments you requested for the quad-port RAMs.

I've also verified the MLAB packing. For example, in cholesky_bdti, there are 342 memory slices packed into 172 MLAB cells, with each cell able to hold up to two slices. This suggests that the packing is working correctly.

However, with the new version merged in, some of the benchmarks that contain MLABs in Titanium and Titan23 will now fail. I’ve prepared the newly generated netlist archive, but before uploading it I need to make a small change to the script that downloads the Titan BLIFs. Since this branch is now closed, I’ll open a new branch for that update.

The golden results also need to be updated with the new MLAB statistics. I already have the new golden results for Titan23, and the run for Titanium is still ongoing but should finish by the end of today.

@vaughnbetz
Copy link
Contributor

vaughnbetz commented Sep 19, 2025

Hi @kimiatkh : There are ten mlab primitives in an MLAB (ram slices), each of which can be up to 2 bits wide. So we should get a bunch of 20-bit wide RAMs in MLABs. Is that happening now? If not, we likely have a regression.
@AlexandreSinger : tagging you too.
Or maybe I'm misinterpreting what you mean by a cell vs. a tile? An MLABLAB tile should usually have 20 bit wide rams in it. Is that happening?

@kimiatkh
Copy link
Contributor Author

Hi @vaughnbetz
By MLAB cells, I meant 2-bit wide MLAB primitives. Previously, the packing of these primitives into a tile of 10 MLAB primitives was functioning correctly, but the issue was that 1-bit RAM slices could not be packed into 2-bit wide RAM primitives due to missing class="memory attribute. That problem is now resolved.

For example, in LU_networks, 8008 1-bit slices are packed into 4004 2-bit primitives, which are then packed into 420 MLAB tiles. This results in an average utilization of ~95% per MLAB tile (with the average packed tile being about 19 bits wide). For other benchmarks, it varies — some reach the higher end (around 15 bits wide), while others are closer to 10 bits wide on average. Looking at the Quartus results, it seems the same variation is present there as well.

@vaughnbetz
Copy link
Contributor

Thanks @kimiatkh ! That sounds like proper behaviour.
There will be some narrower RAMs, so having some at 15 bits etc. seems fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang-cpp C/C++ code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants