- Replaced inline asm in hc_byte_perm() with __builtin_amdgcn_perm()
- Replaced inline asm in hc_bytealign() with __builtin_amdgcn_alignbyte()
- Defined HC_INLINE as default for HIP, significantly boosting kernel performance of pure kernels
- Removed IS_ROCM from inc_vendor.h as it's no longer needed
- Removed backend-specific code from several hash-modes and inc_rp_optimized.cl, as hc_bytealign_S() is now available on all backends
Hashcat is evolving, both in its core and in the supported algorithms.
To uncover bugs in the code, I implemented edge case testing to verify the settings defined in the specific algorithm test modules (e.g., m00000.pm), as well as the behavior of the kernels (pure and optimized) in relation to the different attack modes (-a0, -a1, etc.).
This commit introduces initial support for mixed mode multihash cracking
in Argon2. Although I was skeptical at first, the final solution turned
out better than expected with only a minimal speed loss (1711H/s ->
1702H/s).
Unit tests have been updated to generate random combinations of
Argon2-I/D/ID with randomized m, t, and p values. So far, results look
solid.
Note: This is a complex change and may have undiscovered edge cases.
Some optimization opportunities remain. JIT-based optimizations are not
fully removed. We could also detect single-hash scenarios at runtime
and disable self-tests to re-enable JIT. Currently, the kernel workload
is sized based on the largest hash to avoid out-of-bound memory access.
Fixed compiler warnings in inc_hash_argon2.cl.
Moved argon2_tmp_t and argon2_extra_t typedefs from argon2_common.c back to the module to allow plugin developers to modify them when using Argon2 as a primitive.
Slightly improved autotune behavior for edge cases such as 8700 and 18600, where some algorithms started with theoretical excessively high value, leaving no room for proper tuning.
Removed argon2_module_kernel_threads_min() and argon2_module_kernel_threads_max() from argon2_common.c. Switched to using OPTS_TYPE_NATIVE_THREADS instead. Plugin developers can still use it. This simplifies CPU integration, as CPUs typically run with a single thread.
Updated plugins 15500 and 20510. Added a thread limit to prevent autotune from selecting an excessively high thread count. The issue originated from the runtime returning an unrealistically high ideal thread count.
- Replace hardcoded 'N/A' values with actual Windows system information
- Add GetSystemInfo() for processor architecture detection
- Add GetVersionEx() for Windows version information
- Support both machine-readable and human-readable output formats
- Follow existing Linux uname() implementation pattern
- Maintain cross-platform compatibility
Resolves TODO comment in src/terminal.c line 1257