-
Notifications
You must be signed in to change notification settings - Fork 428
Stratix 10 Quad Port Ram Support #3180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if (ram_info.port_a_output_clock && ram_info.port_b_output_clock) { | ||
reg_outputs = true; //Sequential output | ||
} else if (!ram_info.port_a_output_clock && !ram_info.port_b_output_clock) { | ||
reg_outputs = false; //Comb output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please edit the comments in your code to be consistent the coding style guide:
- Add a space after
//
- Avoid using block comments for implementation-related commnets
//A dual port memory, both port A and B params have been found | ||
} else { | ||
VTR_ASSERT(ram_info.mode == "dual_port" || ram_info.mode == "bidir_dual_port"); | ||
} else if (ram_info.mode == "dual_port" || ram_info.mode == "bidir_dual_port"){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing space ){
} else { | ||
//3b) Use the more detailed name, since it was found in the architecture | ||
mode_hash = tmp_mode_hash; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-write this if statement get rid of the branch where nothing is done
There's a failing regtest, but it may just need a golden result update. The vqm2blif output has changed, but the changes seem like block name changes. Here's a snippet:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One regtest failure and a suggestion on more precise timing modeling (and commenting of quad port behaviour in the arch file).
cout << " port A output sequential: " << bool(ram_info.port_a_output_clock) << "\n"; | ||
cout << " port B output sequential: " << bool(ram_info.port_b_output_clock) << "\n"; | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest a comment of what the else is:
// quad port mode
<port clock="clk_portbin" combinational_sink_ports="eccstatus portadataout portbdataout " name="portbdatain"/> | ||
<port clock="clk_portbin" combinational_sink_ports="eccstatus portadataout portbdataout " name="portbaddr"/> | ||
<port clock="clk_portbin" combinational_sink_ports="eccstatus portadataout portbdataout " name="portbaddr2"/> | ||
<port clock="clk_portbin" combinational_sink_ports="eccstatus portadataout portbdataout " name="portbwe"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is worth putting a comment at the top of the quad port section explaining what a quad port S10 RAM is. It is really simple quad port (2 read ports, 2 write ports). So you have 4 address buses, two datain and two dataout.
It looks like when you have combinational outputs (not registered outputs) you assume portaaddr affects both portadataout and portbdata. Same for portaaddr2. We also have the portbaddr and portbaddr2 affecting both portadtaouta and portbdataout. That is overly conservative: we'll only use one address port to choose the read out data. Not sure if it is always the same one or not; vqm2blif could probably always write it on one of the portaaaddr buses for the data a side, and one of the portbaddr buses for the port b side. Then portadataout and portbdataout would only be affected by the one chosen address port each.
Right now portadatain and portbdatain are always affecting the timing of portadataout and portbdataout when the outputs aren't registered. That is also overly conservative; the datain values will be stored in the RAM but (by default at least) Quartus assumes they won't flow through to the outputs the same cycle (that's called read-during-write behaviour -- Quartus has settings to choose which one you assume, but I think we should assume the less conservative ones where dataina does not affect dataouta etc.
Suggest updating the timing section here.
@kimiatkh : just pinging on this one. I think you have all the arch file changes (sent via email). It would be great to close this item out. |
916b61a
to
8bfe793
Compare
Hi @vaughnbetz, I’ve already generated the new netlists and am currently running the regression tests. Once that’s done, I’ll also need to update the golden results, as the MLAB stats have changed. Just one question: for creating the golden results, is there a stable commit you’d recommend basing my branch on, or is it fine to use the most recent version of master? |
Thanks Kimia. The current master is stable, so it's fine to use.
If you could check at least one MLAB packs properly (we fit 10 or 20 bits
of memory into it), that would be great -- it would be bad if the RAM
packer wasn't packing the bits together.
Best,
Vaughn
…On Thu, Sep 11, 2025 at 9:46 PM kimiatkh ***@***.***> wrote:
*kimiatkh* left a comment (verilog-to-routing/vtr-verilog-to-routing#3180)
<#3180 (comment)>
Hi @vaughnbetz <https://github.com/vaughnbetz>,
So sorry for the delay in finishing this. I’ve made all the changes to the
vqm2blif tool and the architecture file. However, after running the Titan
regression tests, I realized that I need to update the BLIF archive for
Titan that gets downloaded into the vtr repo. This is because, after
setting the MLAB primitive as a memory class, the primitive definition in
the BLIF file now needs to be one bit wide, since Quartus synthesis dumps
one-bit memory slices.
I’ve already generated the new netlists and am currently running the
regression tests. Once that’s done, I’ll also need to update the golden
results, as the MLAB stats have changed.
Just one question: for creating the golden results, is there a stable
commit you’d recommend basing my branch on, or is it fine to use the most
recent version of master?
—
Reply to this email directly, view it on GitHub
<#3180 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNDPJ7WKHDPE3GYMNRAG6D3SIQWNAVCNFSM6AAAAACASSZJTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEOBTGM2DGMBWG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
|
Adding @AlexandreSinger as he may be able to help with this, plus documenting the Titanium benchmarks and how to run them etc. We'd like to get this in soon @kimiatkh so we can refer to Titanium in an upcoming paper. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically good to go ... would like the quad port comment just for clarity, but won't hold this up for it.
I think this PR is good to merge just to get the architecture fixes into Master. We can work on releasing a new version of Titan after (since these architecture changes work with the old version of Titanium as well). Documentation can come in another PR and should not block this PR in my opinion. Will merge once CI completes. |
Makes sense. Let's land this! |
1 similar comment
Makes sense. Let's land this! |
@vaughnbetz I've already added the comments you requested for the quad-port RAMs. I've also verified the MLAB packing. For example, in cholesky_bdti, there are 342 memory slices packed into 172 MLAB cells, with each cell able to hold up to two slices. This suggests that the packing is working correctly. However, with the new version merged in, some of the benchmarks that contain MLABs in Titanium and Titan23 will now fail. I’ve prepared the newly generated netlist archive, but before uploading it I need to make a small change to the script that downloads the Titan BLIFs. Since this branch is now closed, I’ll open a new branch for that update. The golden results also need to be updated with the new MLAB statistics. I already have the new golden results for Titan23, and the run for Titanium is still ongoing but should finish by the end of today. |
Hi @kimiatkh : There are ten mlab primitives in an MLAB (ram slices), each of which can be up to 2 bits wide. So we should get a bunch of 20-bit wide RAMs in MLABs. Is that happening now? If not, we likely have a regression. |
Hi @vaughnbetz For example, in LU_networks, 8008 1-bit slices are packed into 4004 2-bit primitives, which are then packed into 420 MLAB tiles. This results in an average utilization of ~95% per MLAB tile (with the average packed tile being about 19 bits wide). For other benchmarks, it varies — some reach the higher end (around 15 bits wide), while others are closer to 10 bits wide on average. Looking at the Quartus results, it seems the same variation is present there as well. |
Thanks @kimiatkh ! That sounds like proper behaviour. |
This pull request adds support for quad-port RAM in the Stratix 10 architecture file. It also updates the vqm2blif tool to process quad-port RAMs from VQM files and generate the appropriate operation modes in the BLIF output. Additionally, it makes a small modification to the MLAB primitives in the architecture capture by adding the class=memory attribute, enabling VPR to correctly pack MLABs slices in to MLAB primitives.