Fixed old/critical bug on Apple Intel with Metal by patching inc_rp_optimized.cl.
Tested on Apple Intel and Silicon with Metal/OpenCL and on Linux with CUDA, HIP, OpenCL GPU/CPU
Metal Backend: parallelize pipeline state object (PSO) compilation internally
Set unexported setting, setShouldMaximizeConcurrentCompilation, to boost kernel build process on Apple Metal (only >= 3)
This works because CPUs support hardware 64-bit rotate.
Added hc_umullo() and rewrote trunc_mul() for Argon2. No performance
impact, but trunc_mul() is now easier to read.
Re-enabled USE_BITSELECT, USE_ROTATE, and USE_SWIZZLE for OpenCL. We
have a new unit test script; let's see if OpenCL runtimes have
improved.
Previous fix for -m 21800 in multihash mode was incomplete. Now
shows the correct cracked hash.
Re-enabled --hwmon-disable for users. While it's important for SCRYPT
and Argon2 performance, a warning is now shown when it affects
speed.
Updated hash modes with OPTS_TYPE_NATIVE_THREADS:
1376x, 1377x, 1378x, 14800, 19500 and 2300x.
- added support to 2D/3D Compute
- improved compute workloads calculation
Makefile:
- updated MACOSX_DEPLOYMENT_TARGET to 15.0
Unit tests:
- updated install_modules.sh with Crypt::Argon2
Argon2 start works with Apple Metal
Fix context to thread management
Fix missing code in selftest.c, autotune.c, hashes.c, dispatch.c and backend.c
Use IS_HIP depending code makes it easier for future optimization related to inline assembly calls - instead of using IS_CUDA || IS_HIP
See TODO markers for more optimizations / next steps
Reverted d343e2c4a0 and ee26805138
Adds a test to decide whatever conversion technique to use. If all UTF8 characters are 7 bit, there's no need for regular conversion and we can stick to naive conversion.