-
Notifications
You must be signed in to change notification settings - Fork 428
librrgraph: add correct array operator #3306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This is the correct fix for gcc-16. error: no match for 'operator[]' (operand types are 'edge_sort_iterator' and 'long int')
Seems OK to me. @AmirhosseinPoolad : sending to you to take a look at since you are a C++ geek :). @AmirhosseinPoolad : for safety, probably we should run a QoR check on some reasonable benchmark suite (VTR or Titan) to check there is no CPU impact. I don't expect any, but don't know enough about how often this function is called or what the old / default implementation was to be 100% sure. |
@heshpdx: Since you’re now returning by value, does it still need to be const? In the previous PR, I mentioned making it a const reference because it was returned by reference. For the [] operator, it’s common to return a const reference for better memory efficiency. |
Please do check this with a gcc-16 build if you can. @AmirhosseinPoolad |
You may be right. One of our folks said that the initial |
return lhs.swapper_.idx_ <= rhs.swapper_.idx_; | ||
} | ||
|
||
const t_rr_edge_info operator[] (ssize_t n) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const t_rr_edge_info operator[] (ssize_t n) const { | |
t_rr_edge_info operator[] (ssize_t n) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some early testing reports from our next integration hint at needing to remove the const. We will get back to you.
I just compiled VtR master using gcc's master branch and it compiled just fine. @heshpdx Not sure why this code is necessary. ![]() |
The build errors are seen with gcc-16 ToT. Which compiler version did you use? |
I used the master branch of gcc for x86_64 on linux, which I assume is the latest in development version (Running gcc --version reports gcc 16). I'm not familiar with ToT. If you could send me the exact branch of gcc you used I could test the PR using that branch. |
I think I figured out the issue. edge_sort_iterator advertises itself as a I do agree with this PR, this iterator should definitely have an operator[] and it's honestly a total coincidence that it compiles on any machine right now. |
Thanks for the careful review @AmirhosseinPoolad ! |
Some more updates on this:
The main issue here is that rr graph is stored as an struct of arrays, which means it needs it's own custom iterator. Iterators (from what I understand) need a value type and reference type. Here the reference type (edge_swapper) holds an index into rr graph storage and the value type is constructed from the reference type. This is weird because as we saw in #2522 the iterator is not behaving like a pointer at all and adding operator[] (which would work the same way) would only add to the problem. I tried a lot of different options to fix this and make a nice iterator, as far as I know* C++20 straight up does not have a good answer for this. With C++23 we could use ranges::zip_view which would completely fix our problem and we could remove the iterator and everything would be alright. There is the ranges-v3 library which does the same thing (and was the basis for the C++23 stuff) but of course adding any new dependency would need a good amount of testing. * Of course, I don't know all or most of C++. If anyone knows a way to make a well behaved random access iterator for rr graph edges, please let me know. I think what we should do is this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for all the rants I did in this PR @heshpdx! Thank you for flagging this issue. I'm not entirely sure if the current implementation of operator[] is correct. I had some thoughts on it. You should definitely test this PR locally on the machine you're having build issues on (since my machine and the CI machine doesn't use operator[] in std::stable_sort) to make sure it's completely correct. I'm sure the strong tests would suffice:
./run_reg_test.py vtr_reg_strong -j[num_cores]
const t_rr_edge_info operator[] (ssize_t n) const { | ||
edge_sort_iterator ret = *this; | ||
ret.swapper_.idx_ += n; | ||
return ret.swapper_; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this compile successfully on your system if it returned an edge_swapper& like operator* does? operator[n] is (or should be) equivalent to *(it + n). We basically need a combination of operator+ and operator* here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think the previous PR was fine. The function was const and that was correct since (*this + n) constructs a new iterator and doesn't modify 'this' at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!3298 did not work as it still led to segfaults, so Amin reverted it.
ToT stands for Top of Tree, sorry about that. The newer compilers appear to be more strict.
The current PR as is, has now been tested successfully across a wide variety of compilers and systems (gcc, llvm, nvidia/intel/amd/cray/ibm compilers, and their respective hardware). SPEC CPU 2026 is built with -std=c++17
; builds with newer standards might work but we are focusing on C++17 as the baseline. The successful tests are NOT using upstream VPR's unit test or regressions, but rather just the few titan cmdlines of place+route that we run for the benchmarks.
I'm not a tester, I'm just the project leader and messenger 😄. Maybe the community waits to merge this until the problem is exposed by a more mature gcc-16, and maybe the real solution is to use ranges::zip_view
. I don't have an opinion on whether or not this change should be accepted here, I am just sharing that it works for SPEC. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our compiler team did a bisect and said that the "first bad commit" on gcc-16 was the following, so maybe that gives a clue to the behavioral change for VPR which necessitates this PR.
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=aaeca77a79a9a897c97f00f83f2471e7a8bd6685
I'm fine with merging this as everyone seems to agree the old behaviour was not really guaranteed by the standard. @AmirhosseinPoolad : I think you're also saying this is fine to merge, but we may someday be able to do it in a cleaner way (but not yet ... would have to move to a later C++ standard). @AmirhosseinPoolad : I'd still like a quick CPU time check, which I expect won't show any change, but I like to be paranoid ... |
Had a discussion with @AmirhosseinPoolad and @AlexandreSinger . They believe the code we're fixing (not the new code, the existing code) that stores edges in multiple parallel arrays and sorts them is fragile in its use of iterators / subscript operator. @AmirhosseinPoolad is going to rewrite the edge sorting code to be simpler so we don't need to make a complex iterator / subscript operator here. Basic idea:
It might be slightly less efficient, but this is not very hot code anyway (we do one sort after making an rr-graph and before routing). It also shouldn't be the peak memory footprint as the router data structures shouldn't be loaded yet. We should measure that though, and we could do the edge_data_array copy one at a time if it was the peak memory footprint. |
This is the correct fix for gcc-16's build error. It works and doesn't crash on two different machines tested by SPEC CPU folks.