- Replaced inline asm in hc_byte_perm() with __builtin_amdgcn_perm()
- Replaced inline asm in hc_bytealign() with __builtin_amdgcn_alignbyte()
- Defined HC_INLINE as default for HIP, significantly boosting kernel performance of pure kernels
- Removed IS_ROCM from inc_vendor.h as it's no longer needed
- Removed backend-specific code from several hash-modes and inc_rp_optimized.cl, as hc_bytealign_S() is now available on all backends
Improved shared memory handling for -m 10700. Removed the hard-coded limit of 256 threads and now dynamically check the device's shared memory pool to adapt threads accordingly.
Implemented a feature request to display non-default session names early during startup.
Added a check for the number of registers required by a kernel (CUDA and HIP only). This allows us to estimate the max threads per block before entering the auto-tune engine and make pre-adjustments.
Fixed Metal command encoder argument to work with the new auto-tuner's extra kernel invocation.
Fixed incorrect host memory calculation logic during automatic kernel-accel reduction for scrypt-based algorithms. This ensures memory constraints are respected.
Improved several plugins by setting maximum loop counts and others using the OPTS_TYPE_NATIVE_THREADS option.
Fixed compilation on Apple platforms by excluding '#include <sys/sysinfo.h>'.
Updated kernel declarations from "KERNEL_FQ void HC_ATTR_SEQ" to "KERNEL_FQ KERNEL_FA void". Please update your custom plugin kernels accordingly.
Added spilling size as a factor in calculating usable memory per device. This is based on undocumented variables and may not be 100% accurate, but it works well in practice.
Added a compiler hint to scrypt-based kernels indicating the guaranteed maximum thread count per kernel invocation.
Removed redundant kernel code 29800, as it is identical to 27700, and updated the plugin.
Substitute long parameter lists in ~2900 kernel function declarations
with macros. This cleans up the code, reduces probability of copy-paste
errors and highlights the differences between kernel functions. Also
reduces the size of the OpenCL folder by ~3 MB.
In preparation for the abstraction of long repetitive kernel function
declarations, rename the salt buffer pointers to *esalt_bufs. Also
declare them const where they are not.
Renamed pure kernels to default kernels
Replaced long option --length-limit-disable with --optimized-kernel-enable
Replaced short option -L with -O
Set --optimized-kernel-enable to unset by default