Skip to content

FEX-2305

Compare
Choose a tag to compare
@Sonicadvance1 Sonicadvance1 released this 07 May 15:42
· 5101 commits to main since this release

Read the blog post at FEX-Emu's Site!

Welcome back to another release of FEX-Emu! We had cancelled last month's release due to a large amount of code churn happening. In order to ensure
the highest quality of stability we were forced to do so. Now we're back with an even lengthier release this month, so buckle up because there were a
large number of changes that happened.

More AVX Work!

These last two months have been a while ride towards implementing AVX. @Lioncache has been burning down a ton of
instructions to get everything in place for AVX emulation.

New instructions implemented

  • PCMPISTRI/VPCMPISTRI
  • VPMASKMOVD/VPMASKMOVQ
  • VCVTPD2PS/VCVTPS2PD
  • VCVTSD2SS/VCVTSS2SD
  • PCMPESTRI/VPCMPESTRI
  • VMPSADBW
  • VPSLLVD/VPSLLVQ
  • VPSRLVD/VPSRLVQ
  • VCVTSI2SD/VCVTSI2SS
  • VPINSRB/VPINSRD/VPINSRQ/VPINSRW
  • VPSADBW
  • VTESTPD/VTESTPS
  • VPMADDUBSW
  • VPMOVMSKB
  • VMASKMOVPD/VMASKMOVPS

That's a whole bunch of instructions implemented! We have now nearly implemented all the instructions required for AVX.
The two major instructions before AVX can be exposed is the SSE4.2 instructions VPCMPISTRI and VPCMPESTRM. This is because these two
instructions also have AVX versions so it is a required feature in order to support AVX.

We are getting really close and once this feature is done, we can quickly move on to finishing support for AVX2, F16C, and the fused
multiply-accumulate extensions. At that point our CPU emulation will be effectively "feature-complete" for everything that games will care about in
the short-term. Exciting times!

llvm-mingw and WINE support

This is a very big change that has been coming down the pipe for a while now. We have been mostly working behind the scenes to get FEX-Emu wired up so
that it can be compiled as a Windows shared library. This last month is where this work has finally come to a head and most of the work is in place
for this.

How this works is that FEX-Emu has a shared-library and static-library that gets compiled called FEXCore. This is where all the CPU emulation
happens and tries to be mostly OS agnostic, while everything that is Linux specific lives in the frontend called FEXInterpreter. Is is FEXCore now
that can be compiled as a Windows AArch64 PE library. While this isn't currently useful to end users today. This means that WINE can link to this
library for emulating x86/x86-64 on AArch64 platforms. It should be noted that there are still some Linux assumptions strewn about the code, so this
isn't a generic solution for emulation on a true Windows platform. We're writing this support specifically for WINE today.

Converting away from C++ containers that allocate memory

This is the significant change that caused us to cancel last month's release. While @Neobrain was writing code to
support 32-bit library thunking, they had discovered a very big problem. FEX-Emu has long overridden the glibc memory allocation routines in order for
us to ensure that FEX can allocate memory when emulating 32-bit applications. We discovered that this overriding also extends to system libraries that
we load in after the fact. This meant that any time libGL would allocate memory, it would end up being a 64-bit pointer and there was nothing we could
do about it.

The workaround for this problem is to stop overriding the system allocators, which will allow shared libraries to allocate memory that can safely be
used by the 32-bit guest. But this also has the problem that FEX would then run out of memory when executing 32-bit applications. This is due to a
quirk that FEX-Emu needs to allocate all the memory on the system before executing 32-bit applications.

The new workaround is to replace usage of every C++ container that allocates memory with FEX's own container that will use its own allocator. This was
an exceedingly invasive change that touches almost everything in our codebase. With the pain done, FEX now can use its own internal allocators while
system libraries will use the regular glibc allocator as expected. See more about the limitations of this with our
documentation.

Re-enable glibc allocator hooking again

Okay, the previous paragraph was a ruse; FEX-Emu needed to actually override the glibc allocator again. In this case FEX-Emu will actually have three
allocators active at any given moment.

  • FEX-Emu uses jemalloc for its internal allocator.
  • The system allocator is overridden with another jemalloc allocator.
  • The guest application's glibc allocator is untouched.

The problems start occuring when a pointer is shared between thunks and the guest application. If one allocator tries to free a pointer from a
different allocator then fireworks occur. The way around this is to use a jemalloc function to determine if it owns the pointer and choose which
allocator to end up freeing the pointer from. This is particularly painful with X11 thunking because pointers are passed between client and server in
a very laissez faire fashion. This may not stay around in the future but it is a necessary evil for now.

JIT Optimizations and improvements

Reclaim static assigned registers on 32-bit

This allows us to use 8 more general purpose registers and 8 more floating point registers with 32-bit applications. Depending on the game this can
improve performance by a decent margin. We have seen upwards of 20% performance uplift in various games due to it.

Fix Visual C++ redistributable crashing

This was a really annoying bug, where every.single.time. that Proton would run, it would try to install the C++ runtime at least four times. The user
would be required to kill the processes after they were installed. This was fairly egregious because we had thought it was fixed months ago and didn't
realize that it wasn't actually fixed. Depending on the version of the Visual C++ redistributable and Proton it would still occur.

Root causing this issue turns out that the redistributable uses Windows' structured exception handling to catch the case when it passes a null pointer
to strlen which results in a SIGSEGV on the Linux side. FEX was incorrectly saving and restoring state when this occured, which caused it to
infinitely loop and crash. Now that this is fixed, these install correctly and Proton doesn't try doing it on every single run.

Implement REP MOVS as a memcpy

This instruction behaves like a fairly fast memory copy on the CPU. We now convert this over to an internal memory copy operation.
Similar to last month where we converted an instruction to a memset, this instruction being implemented as an IR operation has many times over
performance improvements. In real games this usually translates to a few percentage FPS improvement which is a nice uplift.

Fix restoring of AVX state

While not actually being utilized today (Except due to a bug), @AndreRH found out that we were accidentally failing to
restore AVX register state when a signal handler returned. It's surprising that this wasn't noticed earlier but it could have resulted in some really
bad floating point state.

Remove double syscall overhead on filesystem accesses

When FEX was checking to see if a file exists in the overlayfs style rootfs image we provide, we need to check if the file exists there first. If the
file exists we will redirect the file to be opened from the rootfs instead of the host filesystem. We had an issue that if the file didn't exist, we
would then check for it again on accident before accessing the host file. This would mean that one syscall turned in to three. With this fix in place
we are now only converting it in to two.

If you're running a rootfs image off of a particularly slow drive (or a network share) then this can shave a decent amount of time off of load times.
This was particularly noticeable when running a Proton game under Steam because they will access a ton of files before starting up.

Adds default DRM ioctl interface

This is a fairly basic change. Instead of breaking when hitting an unknown ioctl, pass it to the kernel and hope for the best. This is mostly so Asahi
and other drivers can test things under FEX without pushing patches to us for downstream support.

Add support for thunking Wayland

This doesn't affect most users today but adding support for thunking wayland means in the future applications that use this can sanely use this thunk.
SDL applications today might be able to take advantage of it but it is fairly fresh. We're looking forward to the inevitable Wayland and WINE
utilization to let things move away from X11.

Fixed 32-bit clock_nanosleep

There was a fairly nasty implementation detail where a 32-bit application trying to sleep with this syscall would actually consume a CPU core to 100%.
While fairly uncommon, this allows the game Alwa's Awakening to not burn a CPU core while running.

Add a bunch of functions to FEX's ARMEmitter

Not really a user facing feature but our code emitter has gained a bunch of new instruction support. This will be used in the future for our AVX2
implementation and various things. So it's good to have.

Raw Changes

  • ARM64Dispatcher

  • Fix compiling with mingw (a33443d)

  • ARM64Emitter

  • Ensure platform register is saved on win32 (73ede9d)

  • ARMEmitter

  • Handle SVE2 integer add/subtract wide category (c9fb9c4)

  • Handle SVE Integer Multiply-Add - Unpredicated group (7747ac8)

  • Fix treating 32-bit elements as 64-bit with ld1w (100b4d4)

  • Finish off SVE2 Integer - Predicated group (719803b)

  • Handle SVE Inc/Dec by Predicate Count category (2293614)

  • Handle SVE Element Count category (7f0cccf)

  • Handle SVE predicate count group (aea9028)

  • Handle floating-point multiply add (indexed) groups (7344680)

  • Handle SVE Floating Point Multiply-Add group (3275dab)

  • Handle SVE floating-point compare with zero group (11d396c)

  • Handle SVE Floating-point Serial Reduction (Predicated) group (cd40a85)

  • Handle SVE Floating Point Unary Operations - Unpredicated group (9b70c1d)

  • Handle LDR (vector) (5e8d960)

  • Move op and assertions into SVEFloatArithmeticPredicated helper (1c3af30)

  • Handle SVE Write FFR group (3b74e34)

  • Move SVEBitwiseShiftbyVector into private section (054430f)

  • Remove unused helper functions (94e0591)

  • Fully handle SVE Integer Misc - Unpredicated (3bd4b23)

  • Simplify uses of IsXOrWRegister (060745f)

  • ASIMDOps

  • Remove unnecessary template constraints (2321d2a)

  • ScalarOps

  • Move base opcodes into helper functions (4b8ac0f)

  • Allocator

  • Adds VirtualAlloc with memory Base hint function (ba45bf4)

  • Ensure uses of aligned allocations use aligned_free (c140dd7)

  • Remove pointer indirection overhead (1815418)

  • Disable glibc sbrk allocations (e4fadd6)

  • AllocatorHooks

  • Adds some mingw allocator helpers (86e09a0)

  • Arm64

  • Reclaim SRA registers on 32-bit (3a81efd)

  • VectorOps

  • Remove a few unnecessary EORs from comparisons in SVE path (55d65f3)

  • Handle 64-bit elements in VSShr (72df6a2)

  • Remove unnecessary EOR in VAddP/VFAddP SVE path (1f512b0)

  • Arm64Emitter

  • Replace x18 usage with x30 (121d9fd)

  • CI

  • Adds glibc faulting testing (27c03f9)

  • CMake

  • Get past configuration when mingw is used (aba4257)

  • CPUID

  • Disable RDTSCP under wine (f7d827a)

  • Fix std::min type cast (fbc5d58)

  • Enable FAST REP MOVS (6e0a1a5)

  • CPUInfo

  • Add mingw helper for CalculateNumberOfCPUs (1fad26d)

  • Switch away from using get_nprocs_conf (7f243ee)

  • Config

  • Move path generation to the frontend (da12614)

  • Core

  • Add a new log message for unsupported instruction (88dba60)

  • CoreState

  • Fix SynchronousFaultData padding type (68599bf)

  • Dispatcher

  • Disable signal handling under mingw (0fa4390)

  • Fixes restoring of AVX state (28c168e)

  • Docs

  • Adds programming concerns documentation (bfd606e)

  • FEXConfig

  • Fixes misalignment when advanced option is changed. (4f20dba)

  • FEXCore

  • Moves SIGBUS handler to FEXCore/Utils (86af1f6)

  • Compile without exceptions (37b5bc4)

  • Stop exposing the x86 table data symbols (590422b)

  • Fixup cmake file for mingw (b5420f5)

  • Switch to xbyak for CPUID fetch helpers. (7a774a8)

  • Stop leaking AVX configuration state (5c62ea2)

  • IR_INC dependency on FEXCore_Base (1771d08)

  • Adds fexctl container alias objects (a8ed2af)

  • FEXLoader

  • Don't build with mingw (892c07a)

  • Move to Tools folder (307158d)

  • Move ELFCodeLoader2 to remove 2 (eadce28)

  • FEXServerClient

  • Insert missing padding in message packet (73caa67)

  • FHU

  • FS

  • Create WIN32 helpers for some functions. (af15277)

  • FM

  • Removes double syscall issue with GetEmulatedFDPath (b2a3c6a)

  • Remove unused FD to Name mapping (1aeab04)

  • FileLoading

  • Add WIN32 specific loading path (c942687)

  • Frontend

  • Remove errant header (4bffdc6)

  • GdbServer

  • Disable under mingw (f673afc)

  • HostFeatures

  • Mark DCZID related utilities as [[maybe_unused]] (5e105e8)

  • IR

  • Passes

  • Changes vector to fextl (2d05ffe)

  • Interpreter

  • Separate fallback OpHandler from F80 fallbacks (6e52a16)

  • IntrusiveIRList

  • Ensure this is using the FEX allocator (82f7bb2)

  • Ioctl

  • Add default handler for drm (db706bb)

  • Ensure DRM name check uses strncmp (c1dab3e)

  • LogManager

  • Remove unused handler (1e4a6d4)

  • LookupCache

  • Removes unnecessary recursive lock_guard (6dfea8a)

  • Netstream

  • Disable on WIN32 (2ff5096)

  • ObjectCache

  • Ensure correctly packed config option (059472f)

  • OpDispatcher

  • Implements REP MOVS as Memcpy IR op (11c6f97)

  • OpcodeDispatcher

  • Simplify PCMPXSTRIOpImpl (30974fb)

  • Handle PCMPISTRI/VPCMPISTRI (8824714)

  • Handle VPMASKMOVD/VPMASKMOVQ (238ffdf)

  • Handle VCVTPD2PS/VCVTPS2PD (35f192b)

  • Handle VCVTSD2SS/VCVTSS2SD (512d6d0)

  • Handle PCMPESTRI/VPCMPESTRI (a12802e)

  • Move usages of And(Not( to Andn (f55a653)

  • Implement support for 32-bit SALC instruction (df354e3)

  • Handle store variants of VMASKMOVPD/VMASKMOVPS (cf66643)

  • Handle load variants of VMASKMOVPD/VMASKMOVPS (5cdde0b)

  • Handle VMPSADBW (2c2abc5)

  • Handle VPSLLVD/VPSLLVQ (54e7847)

  • Handle VPSRLVD/VPSRLVQ (91f056b)

  • Handle VCVTSI2SD/VCVTSI2SS (41477db)

  • Handle VPINSRB/VPINSRD/VPINSRQ/VPINSRW (8633528)

  • Handle VPSADBW (1087c45)

  • Handle VTESTPD/VTESTPS (6517f7e)

  • Handle VPMADDUBSW (c14f435)

  • Handle VPMOVMSKB (6645e68)

  • Pass full register size through VExtractToGPR (e92cfc4)

  • RA

  • Use FindFirstSetBit helper (cfc1aa5)

  • SignalDelegator

  • Make sure to save and restore InSyscallInfo (456e9db)

  • Calculate siginfo layout (886c562)

  • Moves all signal handling to the frontend (9432a84)

  • Cleanup unused functions (ad332e3)

  • StringConv

  • Convert to conversion functions that don't use std::string (c5da0e7)

  • StringUtils

  • Stop allocating TrimTokens (f45ea1e)

  • Telemetry

  • Disable on WIN32 (cbe55b0)

  • Threads

  • Moves pthread logic to FEXLoader (874ae5b)

  • Adds SetThreadName helper (797737a)

  • Disable glibc allocator fault testing with exit (6d4cef7)

  • Thunks

  • Fixes xcb helper thread creation (542aeed)

  • Disable under mingw (4a11111)

  • Enable ccache if available (c03ed52)

  • Make xcb's callback more robust. (e8bf7a1)

  • VEXTables

  • Adds a missing class of AVX instructions (a351620)

  • Misc

  • Win32 memory allocation fixes (468f747)

  • Get mingw compiling libFEXCore (7552ad2)

  • Disable AOT and object cache under mingw (361e684)

  • Disable Break/INT operations on mingw (4c74913)

  • llvm-mingw: Fix SoftFloat compiling (2b5ddb6)

  • Add in jemalloc glibc hooking again (1a91d84)

  • Update drm headers to v6.2 (1c2fd72)

  • cpp-optparse: Update to latest optparse (7916281)

  • More glibc allocation removals. (f806ca6)

  • Move FEX away from the remaining glibc allocations that we can (aac4e25)

  • Add support for thunking Wayland (21838fe)

  • Convert most std::string over to fextl (9183cf1)

  • Convert most things to fextl (e3f6ef6)

  • fextl

  • :fmt: Remove fwrite usage (d7bc037)

  • Bulk merge (bba8716)

  • memory

  • Don't allow arrays in fextl::make_unique (65fa495)

  • gvisor

  • Disable timerfd test (780491d)

  • mingw

  • Disable compiling Common/Linux/Tools (d7f9c7e)

  • unittests

  • Add missing VPMASKMOVQ store test (e71f3e8)

  • Change alignment directive in 256-bit VPSADBW test to 32 (d1ece88)

  • gcc

  • Disable mcount_pic test (a78860c)

  • x32

  • Fixes clock_nanosleep syscall (1f44037)