This affects the inner core of nearly all kernels and thus impacts
almost all hash modes. The only functional change is that we now
manually unroll the individual steps of the transform() functions,
saving a small amount of constant memory.
In most cases, JIT compilers would likely detect the unused constant
buffer and remove it automatically, but this makes it explicit.
Tested on newer NVIDIA devices: no speed change observed.
Tested on older NVIDIA devices: visible speed increase.
Tested on AMD devices: visible speed increase across all tested GPUs.
Not yet tested: CPUs, Intel iGPUs, Intel dGPUs.
Updated kernel declarations from "KERNEL_FQ void HC_ATTR_SEQ" to "KERNEL_FQ KERNEL_FA void". Please update your custom plugin kernels accordingly.
Added spilling size as a factor in calculating usable memory per device. This is based on undocumented variables and may not be 100% accurate, but it works well in practice.
Added a compiler hint to scrypt-based kernels indicating the guaranteed maximum thread count per kernel invocation.
Removed redundant kernel code 29800, as it is identical to 27700, and updated the plugin.
Substitute long parameter lists in ~2900 kernel function declarations
with macros. This cleans up the code, reduces probability of copy-paste
errors and highlights the differences between kernel functions. Also
reduces the size of the OpenCL folder by ~3 MB.
Renamed pure kernels to default kernels
Replaced long option --length-limit-disable with --optimized-kernel-enable
Replaced short option -L with -O
Set --optimized-kernel-enable to unset by default