Commit graph

16199 commits

Author SHA1 Message Date
Lioncash
c760ffbd28 BPMemory/XFMemory: Convert defines to enums
These actually convey a concrete type, as well as also providing a
symbolic constant during debugging.
2015-09-01 12:07:10 -04:00
booto
f6e4a8e680 FifoPlayer: Use VI derived timing, not hardcoded 60Hz 2015-09-01 20:24:42 +08:00
booto
8d6c39a89d VI: Adjust forced-progressive hack per magumagu's suggestion 2015-09-01 20:24:41 +08:00
booto
acc9a74174 VI: Restore forced-progressive hack with option
Bugfix: TargetRefreshRate uses rounded result
NTSC's 59.94 was becoming 59 with integer division.
2015-09-01 20:24:40 +08:00
booto
480dbb22f2 VI: derive field timing from VI registers 2015-09-01 20:24:40 +08:00
Ryan Houdek
7a35f9285b [GLES] Support texture_buffer for palette texture conversion.
OpenGL ES 3.2 adds this feature to core
It was available to GLES 3.1 as GL_{EXT, OES}_texture_buffer as well.
For the non-Nvidia vendors that implemented this is:
      - Qualcomm's Adreno 4xx
      - IMGTec's PowerVR Rogue
2015-09-01 05:41:03 -05:00
Lioncash
222b33f0a3 VolumeCreator: Fix a typo in VolumeKeyForPartition's name 2015-08-31 20:01:51 -04:00
Lioncash
1db1a8aacf VolumeCreator: Use a unique_ptr in CreateVolumeFromFilename 2015-08-31 20:01:44 -04:00
flacs
b9ea9c05ad Merge pull request #2931 from lioncash/mem
VertexLoader_Color: Remove some pointer casts
2015-09-01 00:32:43 +02:00
Lioncash
f7e22c8126 VertexLoader_Color: Mark translation-unit-local functions static 2015-08-31 17:31:23 -04:00
Lioncash
ec42be79f3 VertexLoader_Color: Get rid of some pointer casts 2015-08-31 17:31:11 -04:00
Ryan Houdek
ae0a06a018 [AArch64] Implement dcbz instruction 2015-08-31 15:39:47 -05:00
Ryan Houdek
d495ad5104 [AArch64] Make TST reg, reg emitter alias 2015-08-31 14:03:32 -05:00
Ryan Houdek
0f54aa48b4 Merge pull request #2928 from Sonicadvance1/aarch64_improved_singles
[AArch64] Improve floating point single instructions.
2015-08-31 12:00:08 -05:00
Ryan Houdek
bcde1aa8ff [AArch64] Improve floating point single instructions.
Instead of having an "INS" instruction after every single instruction to duplicate the bottom 64bits in to the top 64bits of the register,
create a new FPR register cache type to track when a register's lower 64bits is supposed to be duplicated in to the high 64bits.
Not necessarily actually having the lower bits duplicated in the host side register. This removes inefficient INS instructions from sequential single
float instructions.
In particular a very heavy single heavy block in Animal Crossing went from 712 instructions down to 520 instructions(~37% less instructions!)
2015-08-31 11:09:17 -05:00
Ryan Houdek
d003934b8a Merge pull request #2929 from Sonicadvance1/aarch64_optimize_gpr_flush
Aarch64 optimize gpr flush
2015-08-31 10:55:45 -05:00
Ryan Houdek
8bf332cf08 [AArch64] Optimize GPR cache flushing.
If we are flushing multiple sequential guest GPRs then we can store two in a single STP instruction.
Ikaruga does this quite a bit in their blocks where they do an lmw at the very end and then we have to flush them all.
Typically cuts 16 STR instructions down to 8 STP instructions there.
2015-08-30 23:07:12 -05:00
Ryan Houdek
f2c17436ab [AArch64] Fix issue in emitter.
Loadstore pairs support only signed offsets, not unsigned.
2015-08-30 23:05:59 -05:00
Scott Mansell
368867dba0 Merge pull request #2922 from aserna3/SDBlock
Implemented ability to block writes to the SD card
2015-08-31 04:51:50 +12:00
Ryan Houdek
b907576510 [AArch64] Support profiling by cycle counters if they are available to EL0 2015-08-30 10:25:16 -05:00
Ryan Houdek
5110574c1f Merge pull request #2921 from Sonicadvance1/aarch64_optimize_lmw
[AArch64] Optimize lmw.
2015-08-30 10:23:57 -05:00
Anthony Serna
0390bd61df Fixed introduced compiler warning in Linux 2015-08-29 20:41:59 -07:00
Lioncash
e0aabc5f6c MemcardManager: Remove trivial explicit delete and new
Also gets rid of pointer casting.
2015-08-29 22:46:18 -04:00
Lioncash
d58550e874 MemcardManager: Minor cleanup of header code 2015-08-29 05:19:51 -04:00
Lioncash
0f3e4c50e1 MemcardManager: Correct class indentation 2015-08-29 05:13:20 -04:00
Lioncash
072150589e Merge pull request #2924 from lioncash/scope
Hash: Narrow define scope
2015-08-29 03:12:18 -04:00
Lioncash
e7c7dcaa1f Merge pull request #2923 from lioncash/override
Jit_Util: Add missing override specifiers
2015-08-29 03:12:11 -04:00
Lioncash
310bb46967 Hash: Narrow define scope 2015-08-29 02:57:35 -04:00
Markus Wick
a16669231a Merge pull request #2917 from Sonicadvance1/android_fix_sgs6
[Android] Workaround Mali driver issue on the Samsung Galaxy S6.
2015-08-29 08:56:32 +02:00
Lioncash
df19f11cb9 Jit_Util: Add missing override specifiers 2015-08-29 00:30:18 -04:00
Anthony Serna
db7fe9507e Implemented ability to block writes to the SD card
Renamed variable to be more accurate
2015-08-28 17:32:29 -07:00
Markus Wick
6004ecc521 Merge pull request #2920 from rohit-n/build-pch
Fix building with PCH disabled.
2015-08-28 23:08:24 +02:00
Ryan Houdek
8d61706440 [AArch64] Optimize lmw.
This instruction is fairly heavily used by Ikaruga to load a bunch of registers from the stack.
In particular at the start of the second stage is a block that takes up ~20% CPU time that includes a usage of lmw to load half of the guest
registers.

Basic thing optimized here is changing from a single 32bit LDR to potentially a single 128bit LDR.
a single 32bit LDR is fairly slow, so we can optimize a few ways.
If we have four or more registers to load, do a 64bit LDP in to two host registers, byteswap, and then move the high 32bits of the host registers in
to the correct mapped guest register locations.
If we have two registers to load then do a 32bit LDP which will load two guest registers in a single instruction.
and then if we have only one register left to load, load it as before.

This saves quite a bit of cycles since the Cortex-A57 and A72's LDR instruction takes a few cycles.

Each 32bit LDR takes 4 cycles latency, plus 1 cycle for post-index(which typically happens in parallel.
Both the 32bit and 64bit LDP take the same amount of latency.

So we are improving latencies and reducing code bloat here.
2015-08-28 14:40:30 -05:00
Ryan Houdek
2c3fa8da28 [AArch64] Fix a bug in the register caches.
This is a bug that crops if BindToRegister() is called multiple times in a row without a R() function call between them.
How to reproduce the bug:
1) Have a completely filled cache with no host register remaining
2) Call BindToRegister() with different guest registers
3) Don't call R() between the BindToRegister() calls.

This issue typically wouldn't be seen for a couple of reasons. Typically we have /plenty/ of registers in the cache, and in most cases we only call
BindToRegister() once per instruction. In the off chance that it is called multiple times, it wouldn't update the last used counts and would flush the
same register as the previous call to it.
2015-08-28 14:36:14 -05:00
Rohit Nirmal
6252d2d71a Fix building with PCH disabled. 2015-08-28 14:13:28 -05:00
Lioncash
a6bd2fea28 Merge pull request #2919 from lioncash/vec
Vec3: Remove a memset call on the this pointer.
2015-08-28 15:05:02 -04:00
Lioncash
e787501528 Vec3: Simplify operator== code 2015-08-28 14:46:40 -04:00
Markus Wick
b11de5bddb Merge pull request #2918 from lioncash/memcpy
DataReader: Get rid of pointer casts
2015-08-28 20:45:15 +02:00
Lioncash
d86d5fae9f Merge pull request #2909 from aserna3/DollsAndElves
Implemented .elf and .dol support in gamelist
2015-08-28 14:28:09 -04:00
Lioncash
bb27f80a65 Vec3: Remove a memset call on the this pointer 2015-08-28 14:10:07 -04:00
Anthony Serna
faedf1bc5c Implemented .elf and .dol support in gamelist
Fixed a TON of structuring, formatting.

removed README.txt files from themes at MaJoR's request

Added platform icon for ELFs/DOLs
2015-08-28 11:10:03 -07:00
Ryan Houdek
01db003779 [Android] Workaround Mali driver issue on the Samsung Galaxy S6.
Samsung updated the video drivers on the SGS6 which introduced a bug when disabling vsync.
Both the driver versions are r5p0, but the md5sums of the blob differ.
To work around the issue, make sure to never disable vsync by calling eglSwapInterval.

We can't actually determine the driver version on Android yet.
So until the driver version lands that displays the driver version string in the GL_VERSION string
we will need to keep this workaround enabled at all times, which is a bit annoying.

Current mali drivers return the video driver version in one of the EGL strings you can query.
The issue with that is that Android eats all of those strings, so we can't query it.
2015-08-28 09:02:46 -05:00
flacs
d373dd372d Merge pull request #2913 from Tilka/fix_warning_fix
AVIDump: fix -Wsign-compare warning
2015-08-27 23:50:34 +02:00
Lioncash
4fb3a8b78d DataReader: Get rid of pointer casts 2015-08-27 13:43:04 -04:00
Lioncash
7fa0ecd046 Main: Make the wxLocale class member a unique_ptr 2015-08-27 08:36:01 -04:00
Lioncash
14ae1d23cf Main: Move unofficial build check to its own function
Removes the need to explicitly call exit.
2015-08-27 08:35:51 -04:00
Lioncash
aafae49d24 Main: Move commandline parsing handling to appropriate override functions 2015-08-27 08:29:53 -04:00
Ryan Houdek
d9b18862f3 Merge pull request #2908 from Sonicadvance1/gles_3_2
Support OpenGL ES 3.2.
2015-08-26 18:19:17 -05:00
Ryan Houdek
447b1b09e3 Support OpenGL ES 3.2.
OpenGL ES 3.2 adds a few things we care about supporting in core. In particular:
- GL_{ARB,EXT,OES}_draw_elements_base_vertex
- KHR_Debug
- Sample Shading
- GL_{ARB,EXT,OES,NV}_copy_image
- Geometry shaders
- Geometry shader instancing (If they support GL_{EXT,OES}_geometry_point_size)

Nvidia was the first to release an OpenGL ES 3.2 driver which I uesd to test this on.
This also enables GS Instancing on GLES 3.1 hardware if it supports all of the required extensions.
2015-08-26 17:57:51 -05:00
Ryan Houdek
6d25c469cf Merge pull request #2915 from degasus/arm
JitArm64: Implement rlwnmx
2015-08-26 15:52:37 -05:00
Markus Wick
54f882704a Merge pull request #2914 from JosJuice/fix-volumedirectory
Fix VolumeDirectory
2015-08-26 22:12:23 +02:00
degasus
e516d4ef59 JitArm64: Implement rlwnmx 2015-08-26 21:59:10 +02:00
JosJuice
d276d1abbb Fix VolumeDirectory
Fixes the regression from a225426 and clarifies a related comment.
2015-08-26 19:21:09 +02:00
Markus Wick
3e9dac3910 Merge pull request #2810 from Sonicadvance1/disassembler_improv
Have the disassembler show the PC next to host instructions.
2015-08-26 17:01:39 +02:00
flacs
99e88a7af7 Merge pull request #2887 from Tilka/swap
Jit64: some byte-swapping changes
2015-08-26 16:43:45 +02:00
flacs
eb6ac641be Merge pull request #2906 from Tilka/fpscr
Jit64: fix bugs in the FPSCR instructions
2015-08-26 16:43:28 +02:00
Tillmann Karras
6ec4bdf862 CoreTiming: remove unused functions 2015-08-26 15:40:15 +02:00
Tillmann Karras
0f4861cac2 CoreTiming: make loops easier to read 2015-08-26 14:53:58 +02:00
Ryan Houdek
ca51f1a4f6 [AArch64] Optimize paired registers being used in double operations.
In particular this optimizes the case where a 32bit float is loaded via lfs, and then used in double operations.
This happens very often in Gekko based code because the best way to load a 32bit value as a double is lfs since it automatically turns in to a double value.

There are a few other implications of this in practice as well. Like if both of the paired registers are loaded via psq_l and then used in double
operations it would be improved.
Also if we implement a double register we've got to be careful to make sure we understand if it is in "lower" register or the full 128bit register.
2015-08-26 05:50:04 -05:00
Markus Wick
5716d18d10 Merge pull request #2910 from Sonicadvance1/aarch64_regcache_fix
[AArch64] Fix a bug in the register cache.
2015-08-26 08:31:24 +02:00
Ryan Houdek
4f5f29a0fb [AArch64] Fix a bug in the register cache.
If the register was only a lower pair and it needed the full register, then we need to load the high 64bits.
Which we weren't doing before.
2015-08-26 01:21:43 -05:00
Markus Wick
43d17cb360 Merge pull request #2904 from Sonicadvance1/aarch64_more_inst
[AArch64] Implement fdivx/fdivsx/mfcr/mtcrf.
2015-08-26 07:48:24 +02:00
Tillmann Karras
ee4a12ffe2 Jit64: some byte-swapping changes 2015-08-26 05:41:18 +02:00
flacs
6015e2d812 Merge pull request #2900 from aroulin/x64emitter-rcp
x64Emitter: add RCPPS and RCPSS SSE instructions
2015-08-26 05:05:53 +02:00
Ryan Houdek
6729a36d8d [AArch64] Set BindToRegister's to_load correctly for double FP ops. 2015-08-25 21:29:27 -05:00
Lioncash
db4f692482 GCMemcard: Clean up memcard logging messages. 2015-08-25 21:55:52 -04:00
Tillmann Karras
ee50a2ef28 Jit64: fix bugs in the FPSCR instructions 2015-08-25 23:48:14 +02:00
Markus Wick
bd08c1b01a Merge pull request #2901 from Sonicadvance1/aarch64_stfiwx
[AArch64] Implement stfiwx
2015-08-25 22:47:39 +02:00
Markus Wick
24cb650078 Merge pull request #2663 from degasus/dcbx
Jit64: dcbf + dcbi
2015-08-25 12:16:56 +02:00
Ryan Houdek
0666c0750b [AArch64] Implement fdivx/fdivsx/mfcr/mtcrf.
Gets the povray bench to better times than the Wii.
2015-08-24 15:32:19 -05:00
Ryan Houdek
d96be9250c Merge pull request #2899 from Sonicadvance1/aarch64_fctiwzx
[AArch64] Implement fctiwzx
2015-08-24 13:22:27 -05:00
Ryan Houdek
cd03b8baf6 Merge pull request #2895 from Sonicadvance1/qualcomm_workaround_gles31
Disable OpenGL ES 3.1 on all Qualcomm Adreno devices.
2015-08-24 13:22:12 -05:00
degasus
0d92c8fb89 Jit64: Optimize dcbx 2015-08-24 18:33:23 +02:00
Tillmann Karras
ac84d6d0fa Jit64: some cache flush changes
- dynamically allocate third scratch register instead of forcing ECX
- use LEA as 3 operand add if possible
- use BT,JC instead of SHR,TEST,JNZ
- merge MOV,TEST
- use appropriate ABI function (no asm change)
2015-08-24 18:33:23 +02:00
degasus
6f34b27323 Jit64: implement dcbf + dcbi 2015-08-24 18:33:19 +02:00
Markus Wick
0ad6fa8f62 Merge pull request #2903 from lioncash/cast
Memmap: Remove pointer casts
2015-08-24 15:42:56 +02:00
Lioncash
abd3b124be Memmap: Remove pointer casts 2015-08-24 09:07:09 -04:00
Tillmann Karras
33eefc2d86 Jit64: quickfix for mtfsfx 2015-08-24 12:12:31 +02:00
Ryan Houdek
d3176fe22a [AArch64] Implement stfiwx
Improves povray performance by ~4%
2015-08-24 01:10:55 -05:00
Ryan Houdek
80fa9af9b1 Merge pull request #2898 from degasus/linking
JitArm64: Faster linking of continuous blocks
2015-08-23 18:09:02 -05:00
degasus
7320d519b4 JitArm64: Implement srwx 2015-08-23 23:29:48 +02:00
degasus
4722a69fd0 JitArm64: Implement divwux 2015-08-23 23:29:18 +02:00
degasus
9e4366963c JitArm64: Implement subfic 2015-08-23 23:29:07 +02:00
degasus
95be17772f JitArm64: Implement addex 2015-08-23 23:29:02 +02:00
degasus
025e7c835a JitArm64: Implement subfcx 2015-08-23 23:28:28 +02:00
degasus
550a90e691 JitArm64: Implement subfex 2015-08-23 23:28:24 +02:00
Ryan Houdek
561744819e [AArch64] Implement fctiwzx
Improves the povray benchmark time by 5.6%
2015-08-23 15:35:18 -05:00
Ryan Houdek
4fa23abbe1 [AArch64] Implement MOVI and ORR(imm) in the NEON emitter. 2015-08-23 15:34:53 -05:00
aroulin
0a0e012fab x64Emitter: add RCPPS and RCPSS SSE instructions 2015-08-23 16:59:27 +02:00
degasus
77a6798094 JitArm64: Faster linking of continuous blocks 2015-08-23 14:44:23 +02:00
Markus Wick
73067b1ef1 Merge pull request #2888 from degasus/jit64
Jit64: Faster linking of continuous blocks
2015-08-23 13:24:15 +02:00
Lioncash
2a1abf8dd6 Merge pull request #2896 from lioncash/using
Core: Minor CPU core typedef cleanup
2015-08-22 19:00:23 -04:00
Ryan Houdek
cc3fb7e7b4 Merge pull request #2883 from degasus/master
Profiler: Sort output by total time
2015-08-22 17:52:54 -05:00
Markus Wick
8b881a6c34 Merge pull request #2891 from Sonicadvance1/aarch64_implement_crxxx
[AArch64] Implement the cr instructions
2015-08-23 00:44:47 +02:00
Lioncash
fdafa5d063 Core: Move includes out of instruction table headers
These aren't necessary (and cause unnecessary indirect inclusions).
2015-08-22 14:15:02 -04:00
Lioncash
a248a4d2ce Jit64/JitIL: Relocate instruction typedefs 2015-08-22 14:15:00 -04:00
Lioncash
c56717e058 Core: Shorten the _interpreterInstruction typedef
The class itself already acts as a namespace trailer, so '_interpreter'
isn't necessary. This also gets rid of a duplicate typedef in the
Interpreter_Tables.
2015-08-22 14:14:49 -04:00
Ryan Houdek
b4e4a4cef4 Disable OpenGL ES 3.1 on all Qualcomm Adreno devices.
Their new driver that supports GLES3.1 + AEP has issues with it.
At the very least they don't implement all of the geometry shader features fully which causes shader linker issues when we attempt to use them.
I don't have a device so I can't fully test, so until I do I'm going to blanket disable the whole thing.
2015-08-22 09:12:19 -05:00
Markus Wick
a39c0910c4 Merge pull request #2893 from Sonicadvance1/aarch64_memory_base_register
[AArch64] Use a register as a constant for the memory base.
2015-08-22 15:41:57 +02:00
Ryan Houdek
dba579c52f [AArch64] Use a register as a constant for the memory base.
Removes a /lot/ of redundant movk operations in fastmem loadstores.
Improves performance of the povray bench by ~5%
2015-08-22 08:36:34 -05:00