This commit introduces initial support for mixed mode multihash cracking
in Argon2. Although I was skeptical at first, the final solution turned
out better than expected with only a minimal speed loss (1711H/s ->
1702H/s).
Unit tests have been updated to generate random combinations of
Argon2-I/D/ID with randomized m, t, and p values. So far, results look
solid.
Note: This is a complex change and may have undiscovered edge cases.
Some optimization opportunities remain. JIT-based optimizations are not
fully removed. We could also detect single-hash scenarios at runtime
and disable self-tests to re-enable JIT. Currently, the kernel workload
is sized based on the largest hash to avoid out-of-bound memory access.
Fixed compiler warnings in inc_hash_argon2.cl.
Moved argon2_tmp_t and argon2_extra_t typedefs from argon2_common.c back to the module to allow plugin developers to modify them when using Argon2 as a primitive.
Slightly improved autotune behavior for edge cases such as 8700 and 18600, where some algorithms started with theoretical excessively high value, leaving no room for proper tuning.
Removed argon2_module_kernel_threads_min() and argon2_module_kernel_threads_max() from argon2_common.c. Switched to using OPTS_TYPE_NATIVE_THREADS instead. Plugin developers can still use it. This simplifies CPU integration, as CPUs typically run with a single thread.
Updated plugins 15500 and 20510. Added a thread limit to prevent autotune from selecting an excessively high thread count. The issue originated from the runtime returning an unrealistically high ideal thread count.
- Replace hardcoded 'N/A' values with actual Windows system information
- Add GetSystemInfo() for processor architecture detection
- Add GetVersionEx() for Windows version information
- Support both machine-readable and human-readable output formats
- Follow existing Linux uname() implementation pattern
- Maintain cross-platform compatibility
Resolves TODO comment in src/terminal.c line 1257
As of now, especially in the benchmark mode, hashcat will not go to create and destroy context and command-queue for each enabled device each time it switches from one hash-mode to the next.
Specifically using OpenCL with an NVIDIA device, it was not possible to complete the benchmark because clCreateContext has memory leaks that slowly consume all available GPU memory until hashcat can activate a new context and disable the device.
Avoid deprecated HIP functions
All hipCtx* features have been declared deprecated, so we have replaced them with the new ones, also fixing a critical bug on handling multiple AMD devices in the same system.
Updated autotune to use the iteration count of the first user-defined hash instead of the self-test hash for slow hash tuning, assuming consistency across the hash list.
Updated autotune to prefer best-efficiency thread count only if it is at least 6% better than the max thread count, improving consistency in thread and accel values while allowing exceptions for special modes like 18600.
Changed default theoretical free memory by applying a reduction from max memory from 20% changed to 34%/. This happens only when runtime/OS cannot provide low-level free memory data.
Applied the same logic using --backend-keep-free percentage to host memory during early setup, when hashcat auto-reduces thread and accel counts to stay within limits, and that per compute device.
Changed terminal output from "Host memory required for this attack: ..." to "Host memory allocated for this attack: ...", and added free host memory as reference.
Fixed parameter types in inc_hash_blake2b.cl and inc_hash_blake2s.cl for FINAL value.
Added kernel code for -m 15400 to s04/s08/m04/m08, even if not needed, to help autotune find optimal workitem settings.
Fixed a rare autotune case (e.g. in mode 18600) where threads_min was not a multiple of kernel_preferred_wgs_multiple, and changes it so that as long as it only threads_min is affected and not threads_max, we now ensure at least kernel_preferred_wgs_multiple.
Improved autotune logic for best thread count: double thread count until reaching the device's preferred multiple, then increase in steps of that multiple while comparing efficiency vs. runtime, and select the configuration with best efficiency, not highest thread count.
Always set funnelshift support to true for HIP devices, as it always reports false.
Set minimum loop count to 250 for all VeraCrypt modes with PIM brute-force support.
There are cases where we fix the thread count in a kernel using
FIXED_LOCAL_SIZE, but when the runtime loads the kernel binary, it
reports that it can only execute it with a different thread count.
According to the OpenCL specification, this can happen due to register
pressure.
However, we fix the thread count for a specific reason, and we choose to
accept potential register spilling to global memory. A warning is now
issued to inform the user about the runtime's suggested thread count,
allowing them to override it via the command line if they encounter
issues.
Also fixed the thread count for -m 10700 on NVIDIA's OpenCL, where 4
bytes are always lost for an unknown reason (similar to the issue seen
in bcrypt).