Fixed parameter types in inc_hash_blake2b.cl and inc_hash_blake2s.cl for FINAL value.
Added kernel code for -m 15400 to s04/s08/m04/m08, even if not needed, to help autotune find optimal workitem settings.
Fixed a rare autotune case (e.g. in mode 18600) where threads_min was not a multiple of kernel_preferred_wgs_multiple, and changes it so that as long as it only threads_min is affected and not threads_max, we now ensure at least kernel_preferred_wgs_multiple.
Improved autotune logic for best thread count: double thread count until reaching the device's preferred multiple, then increase in steps of that multiple while comparing efficiency vs. runtime, and select the configuration with best efficiency, not highest thread count.
Always set funnelshift support to true for HIP devices, as it always reports false.
Set minimum loop count to 250 for all VeraCrypt modes with PIM brute-force support.
shuffle() present in some OpenCL runtimes
- Updated autotune logic: if the best kernel-loop is not yet found and
the current kernel-loops setting resulting in a kernel runtime which
is already above a certain threshold, do not skip to kernel-threads
or kernel-accel section if no variance is possible
- Revised all plugin module_unstable_warning() checks for
AMD Radeon Pro W5700X GPU on Metal: rechecked with the latest
Metal version and removed those now fixed
- Inform the user on startup when backend runtimes and devices are
initialized
- Fixed some file permissions in the tools/ folder