- Replaced inline asm in hc_byte_perm() with __builtin_amdgcn_perm()
- Replaced inline asm in hc_bytealign() with __builtin_amdgcn_alignbyte()
- Defined HC_INLINE as default for HIP, significantly boosting kernel performance of pure kernels
- Removed IS_ROCM from inc_vendor.h as it's no longer needed
- Removed backend-specific code from several hash-modes and inc_rp_optimized.cl, as hc_bytealign_S() is now available on all backends
Fixed parameter types in inc_hash_blake2b.cl and inc_hash_blake2s.cl for FINAL value.
Added kernel code for -m 15400 to s04/s08/m04/m08, even if not needed, to help autotune find optimal workitem settings.
Fixed a rare autotune case (e.g. in mode 18600) where threads_min was not a multiple of kernel_preferred_wgs_multiple, and changes it so that as long as it only threads_min is affected and not threads_max, we now ensure at least kernel_preferred_wgs_multiple.
Improved autotune logic for best thread count: double thread count until reaching the device's preferred multiple, then increase in steps of that multiple while comparing efficiency vs. runtime, and select the configuration with best efficiency, not highest thread count.
Always set funnelshift support to true for HIP devices, as it always reports false.
Set minimum loop count to 250 for all VeraCrypt modes with PIM brute-force support.
Switched HIP version check from driverVersion to runtimeVersion
Fixed syntax check of HAS_VPERM macro in several kernel includes causing invalid error message for AMD GPUs on Windows
Updated AMD driver requirements
Updated docs/changes.txt with missing changes from previous commits
Fixed invalid vector data type in Murmur Hash in -a 3 mode
Fixed uninitialized variable warning in src/hashes.c
Fixed broken support for --generate-rules-func-min