Improved shared memory handling for -m 10700. Removed the hard-coded limit of 256 threads and now dynamically check the device's shared memory pool to adapt threads accordingly.
Implemented a feature request to display non-default session names early during startup.
Added a check for the number of registers required by a kernel (CUDA and HIP only). This allows us to estimate the max threads per block before entering the auto-tune engine and make pre-adjustments.
Fixed Metal command encoder argument to work with the new auto-tuner's extra kernel invocation.
Fixed incorrect host memory calculation logic during automatic kernel-accel reduction for scrypt-based algorithms. This ensures memory constraints are respected.
Improved several plugins by setting maximum loop counts and others using the OPTS_TYPE_NATIVE_THREADS option.
Fixed compilation on Apple platforms by excluding '#include <sys/sysinfo.h>'.
- Integrated occupancy hints from vendor APIs (CUDA, HIP) to set a
dynamic threads-per-block limit per kernel instead of using static
values.
- Added `find_tuning_function()` to identify the relevant kernel.
- Autotuner now runs in three stages: threads -> loops -> accel. The
first two stages now stop increasing when the tested kernel runtime
gets too close to the target runtime (96ms for `-w 3`), leaving
headroom for the next stage to adjust in a finer sense.
- Accel tuning now uses a capped floating-point multiplier instead of
powers of two.
- Removed workarounds for missing thread autotuning in plugins.
- Removed the hardcoded 4GiB host memory limit for accel. Added a
cross-platform `get_free_memory()` to check actual free RAM during GPU
initialization, preventing underutilization of high-end GPUs like the
4090. If needed, users can still cap memory usage with `-T` or `-n`.
- Updated enums for ROCm 6.4.x and CUDA 12.9.
- Added code to detect kernel register spilling. That's relevant so we
can keep free enough global memory on the runtime for the runtime to
handle spills efficiently.
Refactored inc_hash_scrypt.cl almost completely and improved macro names in inc_hash_scrypt.h. Adapted all existing SCRYPT-based plugins to the new standard. If you have custom SCRYPT based plugins use hash-mode 8900 as reference.
Fixed some compiler warnings in inc_platform.cl.
Cleaned up code paths in inc_vendor.h for finding values for HC_ATTR_SEQ and DECLSPEC.
Removed option --device-as-default-execution-space from nvrtc for hiprtc compatibility. As a result, added __device__ back to DECLSPEC.
Removed option --restrict from nvrtc compile options since we actually alias some buffers.
Added --gpu-max-threads-per-block to hiprtc options.
Added -D MAX_THREADS_PER_BLOCK to OpenCL options (currently unused).
Removed all OPTS_TYPE_MP_MULTI_DISABLE entries for SNMPv3-based plugins.
These plugins consume large amounts of memory and for this reason,limited kernel_accel max to 256. This may still be high, but hashcat will automatically tune down kernel_accel if insufficient memory is detected.
Removed command `rocm-smi --resetprofile --resetclocks --resetfans` from benchmark_deep.pl, since some AMD GPUs become artificially slow for a while after running these commands.
Replaced load_source() with file_to_buffer() from shared.c, which does the exact same operations.
Moved suppress_stderr() and restore_stderr() to shared.c and reused them in both Python bridges and opencl_test_instruction(), where the same type of code existed.