Update default hash settings to 64MiB:3:4 for Argon2 in -m 70000, following RFC 9106 recommendations.
Add option OPTS_TYPE_THREAD_MULTI_DISABLE: allows plugin developers to disable scaling the password candidate batch size based on device thread count. This can be useful for super slow hash algorithms that utilize threads differently, e.g., when the algorithm allows parallelization. Note: thread count for the device can still be set normally.
Add options OPTI_TYPE_SLOW_HASH_DIMY_INIT/LOOP/COMP: enable 2D launches for slow hash init/loop/comp kernel with dimensions X and Y. The Y value must be set via salt->salt_dimy attribute.
Change autotune kernel-loops start value to the lowest multiple of the target hash iteration count, if kernel_loops_min permits.
Fixed a bug in autotune where kernel_threads_max was not respected during initial init and loop-prepare kernel runs.
Switched HIP version check from driverVersion to runtimeVersion
Fixed syntax check of HAS_VPERM macro in several kernel includes causing invalid error message for AMD GPUs on Windows
Updated AMD driver requirements
Updated docs/changes.txt with missing changes from previous commits
Fixed invalid vector data type in Murmur Hash in -a 3 mode
Fixed uninitialized variable warning in src/hashes.c
Fixed broken support for --generate-rules-func-min
Fix double free() for hip_event1/hip_event2 and hip_stream causes segfault
Replace hc_cuCtxSetCurrent() with hc_cuCtxPushCurrent() in order to align changes with HIP
Add vector datatype operators (if we decide to use them - currently unused)
Fix context to thread management
Fix missing code in selftest.c, autotune.c, hashes.c, dispatch.c and backend.c
Use IS_HIP depending code makes it easier for future optimization related to inline assembly calls - instead of using IS_CUDA || IS_HIP
See TODO markers for more optimizations / next steps