Fixed old/critical bug on Apple Intel with Metal by patching inc_rp_optimized.cl.
Tested on Apple Intel and Silicon with Metal/OpenCL and on Linux with CUDA, HIP, OpenCL GPU/CPU
Metal Backend: parallelize pipeline state object (PSO) compilation internally
Set unexported setting, setShouldMaximizeConcurrentCompilation, to boost kernel build process on Apple Metal (only >= 3)
While HIP doesn't have funnelshift emulation, it's better to use the
native __builtin_amdgcn_alignbit() on AMD. For NVIDIA, we need to make
sure the target matches a supported SM version and fall back to
byte_perm() otherwise.
Fix hash-mode 6900 in optimized mode and attack mode 3
This hash-mode doesn't use any stopbit, and if the password length is
exactly 16 or 32, then hashcat selects the next higher kernel, e.g., s16
for password length 32. For such corner cases, we must copy s08 code to
s16. It doesn't seem other algorithms are affected. Some have the s16
body left empty, but they have a password length limit < 32.
Add test_edge* to .gitignore
This works because CPUs support hardware 64-bit rotate.
Added hc_umullo() and rewrote trunc_mul() for Argon2. No performance
impact, but trunc_mul() is now easier to read.
Re-enabled USE_BITSELECT, USE_ROTATE, and USE_SWIZZLE for OpenCL. We
have a new unit test script; let's see if OpenCL runtimes have
improved.
Previous fix for -m 21800 in multihash mode was incomplete. Now
shows the correct cracked hash.
Re-enabled --hwmon-disable for users. While it's important for SCRYPT
and Argon2 performance, a warning is now shown when it affects
speed.
Updated hash modes with OPTS_TYPE_NATIVE_THREADS:
1376x, 1377x, 1378x, 14800, 19500 and 2300x.
only the first hash in a multihash list was marked as cracked, regardless
of which hash was actually cracked. For example, if the second hash was
cracked, it incorrectly marked the first as cracked and left the second
uncracked. This issue only affected beta versions and only in multihash
cracking mode.
Added deep-comp kernel support for Kerberos modes 28800 and 28900,
enabling multihash cracking for the same user in the same domain, even if
the password was changed or the recording was bad.
Added a rule ensuring that device buffer sizes for password candidates,
hooks, and transport (tmps) must be smaller than 1/4 of the maximum
allocatable memory. If not, hashcat now automatically reduces kernel-accel
down to 1, then halves the number of threads and restores kernel-accel up
to its maximum, repeating until the size requirement is met.
Fixed salt length limit verification for -m 20712.
Fixed password length limit for -m 14400.
Fixed unit test salt generator for -m 21100, which could produce duplicate
hashes under certain conditions.
Added the OPTS_TYPE_NATIVE_THREADS flag to the following hash modes
(after benchmarking): 7700, 7701, 9000, 1375x, 1376x, 14800, 19500, 23900.
- Used blake2b_transform() instead of blake2b_update() to avoid compiler problems
on Intel OpenCL and segfaults on POCL (still unsure of exact cause but possibly
related to the shuffle functions in combination with these OpenCL drivers).
- Remove 'bug' comments (these are resolved now).
- Added implementation of 'argon2_hash_block()' for non-warped (CPU) case.
- Introduced 'LBLOCKSIZE' for the size of an argon2 block per thread in u64.
Most of the code should now be able to support any warp/wavefront size.
- Replaced inline asm in hc_byte_perm() with __builtin_amdgcn_perm()
- Replaced inline asm in hc_bytealign() with __builtin_amdgcn_alignbyte()
- Defined HC_INLINE as default for HIP, significantly boosting kernel performance of pure kernels
- Removed IS_ROCM from inc_vendor.h as it's no longer needed
- Removed backend-specific code from several hash-modes and inc_rp_optimized.cl, as hc_bytealign_S() is now available on all backends
This commit introduces initial support for mixed mode multihash cracking
in Argon2. Although I was skeptical at first, the final solution turned
out better than expected with only a minimal speed loss (1711H/s ->
1702H/s).
Unit tests have been updated to generate random combinations of
Argon2-I/D/ID with randomized m, t, and p values. So far, results look
solid.
Note: This is a complex change and may have undiscovered edge cases.
Some optimization opportunities remain. JIT-based optimizations are not
fully removed. We could also detect single-hash scenarios at runtime
and disable self-tests to re-enable JIT. Currently, the kernel workload
is sized based on the largest hash to avoid out-of-bound memory access.
Fixed compiler warnings in inc_hash_argon2.cl.
Moved argon2_tmp_t and argon2_extra_t typedefs from argon2_common.c back to the module to allow plugin developers to modify them when using Argon2 as a primitive.
Slightly improved autotune behavior for edge cases such as 8700 and 18600, where some algorithms started with theoretical excessively high value, leaving no room for proper tuning.
Removed argon2_module_kernel_threads_min() and argon2_module_kernel_threads_max() from argon2_common.c. Switched to using OPTS_TYPE_NATIVE_THREADS instead. Plugin developers can still use it. This simplifies CPU integration, as CPUs typically run with a single thread.
Updated plugins 15500 and 20510. Added a thread limit to prevent autotune from selecting an excessively high thread count. The issue originated from the runtime returning an unrealistically high ideal thread count.
Fixed parameter types in inc_hash_blake2b.cl and inc_hash_blake2s.cl for FINAL value.
Added kernel code for -m 15400 to s04/s08/m04/m08, even if not needed, to help autotune find optimal workitem settings.
Fixed a rare autotune case (e.g. in mode 18600) where threads_min was not a multiple of kernel_preferred_wgs_multiple, and changes it so that as long as it only threads_min is affected and not threads_max, we now ensure at least kernel_preferred_wgs_multiple.
Improved autotune logic for best thread count: double thread count until reaching the device's preferred multiple, then increase in steps of that multiple while comparing efficiency vs. runtime, and select the configuration with best efficiency, not highest thread count.
Always set funnelshift support to true for HIP devices, as it always reports false.
Set minimum loop count to 250 for all VeraCrypt modes with PIM brute-force support.
- added support to 2D/3D Compute
- improved compute workloads calculation
Makefile:
- updated MACOSX_DEPLOYMENT_TARGET to 15.0
Unit tests:
- updated install_modules.sh with Crypt::Argon2
Argon2 start works with Apple Metal