FEX-2302
Read the blog post at FEX-Emu's Site!
This month certainly passed in the blink of an eye. A lot of good bug fixes this month as usual! Continue reading to find out more.
Fix incorrect operation for cache line clears
In emulating the CLFLUSH instruction, FEX was incorrectly using the wrong operation for clearing caches. We were accidentally using the CVAU operation instead of CIVAC.
While this is incorrect, it was hard to find anything that was actually affected by the wrong implementation. With Snapdragon's open source Vulkan driver implementing what is required for VKD3D,
it became evident from Vulkan tests that this was incorrectly implemented. Switching the implementation is easy and will let VKD3D run without hacks
when the required feature is finished.
Bug fixes to 64-bit x87 emulation
A big thanks to CallumDev for finding and fixing these latest bugs in FEX's less accurate x87 emulation. As a
reminder, x87 on original hardware operates using 80-bit float values. This is a feature that ARM doesn't natively support, so FEX needs to emulate
this using a software floating point library. We have a hack in our configuration to allow removing this software implementation and instead operate
using 64-bit double operations instead. This can significantly improve performance in some 32-bit games but introduce rendering artifacts.
This month there were many bug fixes:
- ALU operations that consume integers converted to floats are fixed
- Float comparison that also consumes 16-bit integers fixed
- FPREM instruction no longer infinite looping
With these fixes in place, a large number of games now actually render correctly with this hack enabled. It will be interesting to see how well this
improves performance or batterty savings in 32-bit games!
More AVX instructions emulated
With one of FEX's developers taking some away time, this was a little less involved than the last couple of months.
There was still a handful of instructions implementation
- VPBLENDD, VBLENDPS, and VPSRAVD
Additionally while these aren't AVX instruction, we also implemented the CLWB and CLFLUSHOPT instructions. These match their ARM equivalents so it was
mostly an easy implementation that applications can use if they want.
Fix copy and paste error in Arm64 JIT
While this is a fairly minor issue, we had a copy and paste error in FEX's register spilling code. This caused Steam to crash in certain situations,
so fixing this since the previous release helps users wanting to run that.
A bunch of minor optimizations
This month had a bunch of small optimizations around the entire project. Alone these are all quite minor but added together should result in a couple
percentage of CPU time removed from FEX's JIT.
- Arm64 Dispatcher is slightly faster
- CPUID emulation initialization is faster
- Optimize File loading, improving config loading time
- Frontend instruction decoder optimizations to be faster
- Makes IR operations 1 byte smaller, improving memory usage
- Inline IR constants optimization to reduce IR memory size
Fixing thunk symbol override fetching
FEX's thunks had an issue where if a library was loaded, we would only ever fetch relevant symbols from that library directly. While this worked for
our use case, it breaks when wanting to use MangoHud in OpenGL applications. Resolving this issue fixes most things that will override symbols with
LD_PRELOAD.
Update JEMalloc from 5.2.1 to 5.3.0
While this is a fairly minor change, this release on JEMalloc fixes some bugs and improves performance. Small but every performance improvement is
welcome.
Support for execveat with AT_EMPTY_PATH
This is an interesting feature where an application can be executed directly through a file descriptor instead of a filepath on disk. This is a fairly
simple idea but has some interesting edge cases that might be interesting to some people. To see the more technical information about implementing
this, check out the pull request.
Raw Changes
-
ARMEmitter
-
Handle integer add/subtract vectors (predicated) instruction class (9d33bba)
-
Handle RMIF, SETF8/SETF16 (a899f9f)
-
Handle SVE floating-point recursive reduction (1cda029)
-
Add a few missing instructions (2c9f99e)
-
Support helper for long address generation (f8d56a8)
-
Removes some warnings that cropped up (5fd8fdb)
-
Arm64
-
Merge two loads in to an LDP (a28039f)
-
Fixes incorrect operation for CacheLineClear (f8d92aa)
-
Use switch statement for op handlers instead of jump table (565ed45)
-
Fix SpillRegister C&P error (9c93c6f)
-
Fixes large offset spill slots (9acb513)
-
VectorOps
-
Clamp shift amount to esize-1 for VSShr (9a318ca)
-
ArmEmitter
-
Adds two more classes of ASIMD instructions (95e544c)
-
Adds three more classes of ASIMD instructions (81e0ac7)
-
CPUID
-
Optimize initialization (f614fc6)
-
Config
-
Fix relative execve applications. (65971ef)
-
ConstProp
-
Pool inline constants (1e90ebb)
-
Core
-
Adjust virtual memory size for 32-bit (7f6a620)
-
Dispatcher
-
Extract 64-bit signal frame save and restore (65b6b6d)
-
Fixes x86-64 SA_SIGINFO generation (8dae785)
-
ELFCodeLoader
-
Don't use std::random_device for RNG (f5e97f3)
-
Emitter
-
Remove unused header (90bcb8c)
-
External
-
Update JEMalloc to disable 16k pages (bbf9198)
-
Externals
-
Update jemalloc to 5.3.0 (9322e55)
-
F64
-
Fix integer immediates for add,mul,div,sub (c2325e1)
-
FEXCore
-
Fixup 32-bit signal handling (fa1193f)
-
FEXLoader
-
Adds support for execveat with AT_EMPTY_PATH (dcce9ad)
-
Build FEXInterpreter and FEXLoader independently (8974509)
-
FEXRootFSFetcher
-
Support option to auto select first distro (a7aeb4a)
-
FEXServer
-
Remove POLLREMOVE usage (d2d5282)
-
FileLoading
-
Optimize FileLoad (28dd946)
-
Frontend
-
Various optimizations (787b689)
-
Github
-
Add ARM emitter tests to CI (da88c68)
-
IR
-
Removes NumArgs member from IR ops (9403c66)
-
Remove HasDest member (f8e762f)
-
JitSymbols
-
Fixes file opening and writing (a486797)
-
Fixes a crash that can occur (34e1ba6)
-
Linux
-
Fixes shebang file execution (477d4b6)
-
MContext
-
Insert a stack cookie with assertions enabled (7664359)
-
OpDispatcher
-
Adds support for CLWB and CLFLUSHOPT (7be2e1a)
-
Fixes a few missing GPR/XMM helper usages (4aa984a)
-
OpcodeDispatcher
-
Handle VPBLENDD/VBLENDPS (62e6ada)
-
Handle VPSRAVD (fe79f61)
-
Scripts
-
Update InstallFEX.py rootfs links (df87042)
-
Syscalls
-
Fix out-of-bounds read when handling single-line shebang files (3d29dac)
-
Thunks
-
Fixes host symbol overrides (9d35bc0)
-
X86Tables
-
Optimize struct layouts (dfc3297)
-
Misc
-
X87_F64: Fixes FICOM (afaff92)
-
fix ifdef to use HAS_SYSCALL_TGKILL for tgkill as it was intented (8d0329d)
-
fix tgkill (1521e0a)
-
Fix FPREM flags calculation in F64 (632add6)
-
unittests
-
Adds negative integer x87 tests (ee58c5d)