Most of the InvalidateICache calls are for a 32 bytes block: this is the
number of bytes invalidated by PowerPC dcb*/icb* instructions. Profiling
shows that a lot of CPU time is spent checking if there are any JIT blocks
covered by these 32 bytes (using std::map::lower_bound).
This patch adds a bitset containing the state of every 32 bytes block in
RAM (JIT cached/not JIT cached). Using that, a 32 bytes InvalidateICache
can check in the bitset if any JIT block might be invalidated. A bitset
check is a lot faster than an std::map::lower_bound operation, improving
performance of JitCache::InvalidateICache by more than 100%.
Some practical numbers:
* Xenoblade Chronicles (PAL)
56.04FPS -> 59.28FPS (+5.78%)
* The Last Story (PAL)
30.9FPS -> 32.83FPS (+6.25%)
* Super Mario Galaxy (PAL)
59.76FPS -> 62.46FPS (+4.52%)
This function still takes more time than it should - more optimization in
this area might be possible (specializing for 32 bytes blocks to avoid
useless memcpy, for example).
These merges, while in theory improving emulation accuracy, cause issues
in other parts of the emulator based on invalid assumptions. memcard-delay
fixed some of these issues in the EXI memcard code, but several other
problems still exist and I don't have the time to debug that right now.
This was not needed for most games before because the external exception was
itself delayed. aram-dma-fixes changed that and made the external exception
happen a lot quicker, breaking games that relied on the memcard operations
delay.
Fixes issue 5583.
To use, install OpenVPN's TAP device driver. Then create a network bridge between the TAP and your device connected to the internet.
TODO:
proper overlapped read - can look at qemu impl
non-windows impl
MTMSR is executed.
This commits fixes issue 617. WWE Day of Reckoning 1 and 2 are now playable
with Dolphin.
The changes are not implemented for JitIL yet.
the intent is to replace the haphazard scheduling and finger-crossing associated with saving/loading with the correct and minimal necessary wait for each thread to reach a known safe location before commencing the savestate operation, and for any already-paused components to not need to be resumed to do so.
* misc-speedups:
fixed and reenabled and slightly optimized the JIT version of fcmpo/fcmpu.
slightly more precise speed percent display (this is really minor)
a small thread synchronization speedup for dual core mode. it's most noticeable in games where the CPU is running behind compared to the GPU.
Conflicts:
Source/Core/Core/Src/PowerPC/Jit64/Jit.cpp
The Fifo.cpp changes from rdaefb3b550e2 was not merged as there was no performance benefit.