-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
SPU: SPURS oriented thread waiting #17646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
a6a2fc0 to
9f8c31f
Compare
|
Added "SPURS oriented thread waiting" which is gonna replace "Preferred SPU Threads" setting and be active by default. |
|
|
||
| constexpr u32 _1m = 1u << 20; | ||
|
|
||
| std::unique_lock fast_lock(render->sys_rsx_mtx, std::defer_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the benefit of the double check under lock especially since we expect frame-to-frame the mappings wont actually change. Why not just lock and check once? I feel that would be faster here.
| break; | ||
| } | ||
|
|
||
| const u64 current = get_system_time(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pointed it out before, but get_system_time is unreasonably heavy. Prefer TSC unless real-world precise values are required.
A general note - the spu_info logic (test_and_update_atomic_op_info) in general is quite heavy-handed with all the atomic ops and may eat into performance. The biggest issue I see is that there is no fast-path through this calling sequence (and the corresponding one below). Yes, spurs itself is going to be almost always running task groups but we also observe that in most games the parallel misses themselves aren't too bad with modern processors, though I agree we need something more sophisticated than the quick hack that was the preferred threads option.
This is all theory of course, we'll just have to see if it ends up worth the overhead with the big hitters like RDR, TLOU or killzone titles.
|
|
||
| spu_info[index].release(info); | ||
|
|
||
| for (usz i = 0; i < spu_info.size(); i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can abuse vector ops for this sequence and gain implicit atomicity.
Have the spu info as an object of arrays instead of array of objects.
Then you can just load all of them at once and (ab)use vector ops on the vector to figure out how much overlap there is.
On x86 at least, vector ops are atomic as long as they are naturally aligned too so we basically get that for free.







Optimizations:
sys_memory_get_page_attributeinternally.The writer lock in
sys_memory_get_page_attributewas causing SPUs to wait unjustly.sys_rsx_context_iomap.spu_thread::reservation_checkaddress receptacle from writer_lock detection and waiting.spu_thread::reservation_check(hash)overload for main and stack memory.spu_thread::reservation_checkwhen the address is on the same page asGETLLAR's effective address.Fixes #14724