Fixed the automatic kernel acceleration adjustment routine to account for some OpenCL runtime's buffer size limitation (1/4).
Added a missing license header to scrypt_commit.c (MIT).
Updated kernel declarations from "KERNEL_FQ void HC_ATTR_SEQ" to "KERNEL_FQ KERNEL_FA void". Please update your custom plugin kernels accordingly.
Added spilling size as a factor in calculating usable memory per device. This is based on undocumented variables and may not be 100% accurate, but it works well in practice.
Added a compiler hint to scrypt-based kernels indicating the guaranteed maximum thread count per kernel invocation.
Removed redundant kernel code 29800, as it is identical to 27700, and updated the plugin.
Refactored inc_hash_scrypt.cl almost completely and improved macro names in inc_hash_scrypt.h. Adapted all existing SCRYPT-based plugins to the new standard. If you have custom SCRYPT based plugins use hash-mode 8900 as reference.
Fixed some compiler warnings in inc_platform.cl.
Cleaned up code paths in inc_vendor.h for finding values for HC_ATTR_SEQ and DECLSPEC.
Removed option --device-as-default-execution-space from nvrtc for hiprtc compatibility. As a result, added __device__ back to DECLSPEC.
Removed option --restrict from nvrtc compile options since we actually alias some buffers.
Added --gpu-max-threads-per-block to hiprtc options.
Added -D MAX_THREADS_PER_BLOCK to OpenCL options (currently unused).
Removed all OPTS_TYPE_MP_MULTI_DISABLE entries for SNMPv3-based plugins.
These plugins consume large amounts of memory and for this reason,limited kernel_accel max to 256. This may still be high, but hashcat will automatically tune down kernel_accel if insufficient memory is detected.
Removed command `rocm-smi --resetprofile --resetclocks --resetfans` from benchmark_deep.pl, since some AMD GPUs become artificially slow for a while after running these commands.
Replaced load_source() with file_to_buffer() from shared.c, which does the exact same operations.
Moved suppress_stderr() and restore_stderr() to shared.c and reused them in both Python bridges and opencl_test_instruction(), where the same type of code existed.
Do not adjust kernel-accel or scrypt-tmto for GPUs with unified memory, typically integrated GPUs in CPUs (APUs).
Redesign the "4-buffer" strategy to avoid overallocation from naive division by four, which can significantly increase memory usage for high scrypt configurations (e.g., 256k:8:1).
Update the scrypt B[] access pattern to match the new "4-buffer" design.
Allow user-specified kernel-accel and scrypt-tmto values, individually or both, via command line and tuning database. Any unspecified parameters are adjusted automatically.
Permit user-defined combinations of scrypt-tmto and kernel-accel even if they may exceed available memory.
Add support for lower iteration counts per kernel invocation than the default, enabling TMTO for low scrypt configurations, such as N=1024
Use TMTO 2 if it reaches 4 times the device processor count, instead of TMTO 1 always
Improve performance for low scrypt configurations (hash-mode 9300)
Fix unit test for 15700 with correct scrypt configurations
Disable CPU over subscription for SCRYPT based algorithms