Skip to content

Conversation

@elad335
Copy link
Contributor

@elad335 elad335 commented Oct 31, 2025

Optimizations:

  • cellRtc functions were called non-stop by the PPU thread which use sys_memory_get_page_attribute internally.
    The writer lock in sys_memory_get_page_attribute was causing SPUs to wait unjustly.
  • Fast path without locking VM has been added to sys_rsx_context_iomap.
  • Exclude spu_thread::reservation_check address receptacle from writer_lock detection and waiting.
  • Fix spu_thread::reservation_check(hash) overload for main and stack memory.
  • Add fast path without checking allocation for spu_thread::reservation_check when the address is on the same page as GETLLAR's effective address.

Fixes #14724

@elad335 elad335 added CPU Bugfix Optimization Optimizes existing code labels Oct 31, 2025
@elad335 elad335 requested a review from kd-11 October 31, 2025 10:02
@elad335 elad335 force-pushed the gamer branch 3 times, most recently from a6a2fc0 to 9f8c31f Compare November 2, 2025 07:28
@elad335 elad335 changed the title vm/sys_memory: Remove VM locking in sys_memory_get_page_attribute SPU: SPURS oriented thread waiting Nov 2, 2025
@elad335
Copy link
Contributor Author

elad335 commented Nov 2, 2025

Added "SPURS oriented thread waiting" which is gonna replace "Preferred SPU Threads" setting and be active by default.

@elad335
Copy link
Contributor Author

elad335 commented Nov 2, 2025

Special settings tweaked from default:

Clocks Scale: 800%
Vblank Frequency: 480Hz
Frame Limit: Off
Preferred SPU Threads: Auto (as default)

PR:

image

Master

image

@digant73
Copy link
Contributor

digant73 commented Nov 2, 2025

performance are regressed on some games such as KZ3

PR:

kz3_pr

MASTER:

kz3_master


constexpr u32 _1m = 1u << 20;

std::unique_lock fast_lock(render->sys_rsx_mtx, std::defer_lock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the benefit of the double check under lock especially since we expect frame-to-frame the mappings wont actually change. Why not just lock and check once? I feel that would be faster here.

break;
}

const u64 current = get_system_time();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pointed it out before, but get_system_time is unreasonably heavy. Prefer TSC unless real-world precise values are required.

A general note - the spu_info logic (test_and_update_atomic_op_info) in general is quite heavy-handed with all the atomic ops and may eat into performance. The biggest issue I see is that there is no fast-path through this calling sequence (and the corresponding one below). Yes, spurs itself is going to be almost always running task groups but we also observe that in most games the parallel misses themselves aren't too bad with modern processors, though I agree we need something more sophisticated than the quick hack that was the preferred threads option.
This is all theory of course, we'll just have to see if it ends up worth the overhead with the big hitters like RDR, TLOU or killzone titles.


spu_info[index].release(info);

for (usz i = 0; i < spu_info.size(); i++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can abuse vector ops for this sequence and gain implicit atomicity.
Have the spu info as an object of arrays instead of array of objects.
Then you can just load all of them at once and (ab)use vector ops on the vector to figure out how much overlap there is.

On x86 at least, vector ops are atomic as long as they are naturally aligned too so we basically get that for free.

@MsDarkLow
Copy link
Contributor

i9-13900K | RTX 3080
Ratchet & Clank Tools of Destruction - Improved a little bit. 88.3 -> 92.5
God of War 3 - Very small Regression. 62.4 -> 61.0
Sonic Unleashed - Tremendously reduced performance. From a stable 42.4 to very unstable 32.4

Ratchet & Clank

ratchet tod

God of War 3

gow3

Sonic Unleahsed

sonic u

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bugfix CPU Optimization Optimizes existing code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Regression] Performance Regressions in Hot Shots Golf: Out of Bounds (#11904, #12523)

4 participants