Add hc_uint4_t to SCRYPT to work-around Intel OpenCL alignment bug.
Align large buffers (V1-V4) manually to 1k-byte boundaries.
Replace uint4 xor operator with xor_uint4() function.
In the automatic downtune routine, hashcat prepares a fixed 512MiB host
buffer that is known to be allocated by the compute runtimes (CUDA, HIP,
OpenCL, Metal), and over which hashcat has no control.
However, hashcat still divides the maximum available host memory by the
active device count to automatically as a preparation to later downtune
the -n and -T parameters when memory is limited.
Hashcat reserves 512MiB per active device. With bridges, the active
devices become bridge units, which for modes 70000, 70100, and 70200
equals the CPU core count. On a 32-core CPU, this multiplies to 16GiB,
even though the memory is actually shared because of threading.
This leads to an overestimation of memory usage.
A simple fix is to divide the 512MiB buffer by the active device count.
This keeps the full 512MiB for a single GPU but avoids overestimating
memory usage with many virtual devices.
Fix code handling kernel-accel value in argon2_common.c for CPU,
which was accidentally removed during previous refactoring.
Set thread count to 1 for hash-mode 70000. Oversubscribing the CPU isn't
useful here. This allows to keep the wordlist count low, which is very
welcome for slow hashes like Argon2id.
Fix unit test for 20011/20012/20013 (DiskCryptor) by adding setuptools
to install_modules.sh and replacing AES.MODE_XTS with python_AES.MODE_XTS.
Fix false negative for kernel 32800, which only occurred if all
conditions were true: multihash, -a 3, optimized mode, and password
length between 16 and 31.
Fix Python package name in BUILD_WSL.md command line example.
Fixed old/critical bug on Apple Intel with Metal by patching inc_rp_optimized.cl.
Tested on Apple Intel and Silicon with Metal/OpenCL and on Linux with CUDA, HIP, OpenCL GPU/CPU
Metal Backend: parallelize pipeline state object (PSO) compilation internally
Set unexported setting, setShouldMaximizeConcurrentCompilation, to boost kernel build process on Apple Metal (only >= 3)
Code originally taken from module 28300 was obviously conflicting with
OPTS_TYPE_DYNAMIC_SHARED and thus corrupting the local memory when
REAL_SHM was set.
While HIP doesn't have funnelshift emulation, it's better to use the
native __builtin_amdgcn_alignbit() on AMD. For NVIDIA, we need to make
sure the target matches a supported SM version and fall back to
byte_perm() otherwise.
Fix hash-mode 6900 in optimized mode and attack mode 3
This hash-mode doesn't use any stopbit, and if the password length is
exactly 16 or 32, then hashcat selects the next higher kernel, e.g., s16
for password length 32. For such corner cases, we must copy s08 code to
s16. It doesn't seem other algorithms are affected. Some have the s16
body left empty, but they have a password length limit < 32.
Add test_edge* to .gitignore
This works because CPUs support hardware 64-bit rotate.
Added hc_umullo() and rewrote trunc_mul() for Argon2. No performance
impact, but trunc_mul() is now easier to read.
Re-enabled USE_BITSELECT, USE_ROTATE, and USE_SWIZZLE for OpenCL. We
have a new unit test script; let's see if OpenCL runtimes have
improved.
Previous fix for -m 21800 in multihash mode was incomplete. Now
shows the correct cracked hash.
Re-enabled --hwmon-disable for users. While it's important for SCRYPT
and Argon2 performance, a warning is now shown when it affects
speed.
Updated hash modes with OPTS_TYPE_NATIVE_THREADS:
1376x, 1377x, 1378x, 14800, 19500 and 2300x.
only the first hash in a multihash list was marked as cracked, regardless
of which hash was actually cracked. For example, if the second hash was
cracked, it incorrectly marked the first as cracked and left the second
uncracked. This issue only affected beta versions and only in multihash
cracking mode.
Added deep-comp kernel support for Kerberos modes 28800 and 28900,
enabling multihash cracking for the same user in the same domain, even if
the password was changed or the recording was bad.
Added a rule ensuring that device buffer sizes for password candidates,
hooks, and transport (tmps) must be smaller than 1/4 of the maximum
allocatable memory. If not, hashcat now automatically reduces kernel-accel
down to 1, then halves the number of threads and restores kernel-accel up
to its maximum, repeating until the size requirement is met.
Fixed salt length limit verification for -m 20712.
Fixed password length limit for -m 14400.
Fixed unit test salt generator for -m 21100, which could produce duplicate
hashes under certain conditions.
Added the OPTS_TYPE_NATIVE_THREADS flag to the following hash modes
(after benchmarking): 7700, 7701, 9000, 1375x, 1376x, 14800, 19500, 23900.
- Used blake2b_transform() instead of blake2b_update() to avoid compiler problems
on Intel OpenCL and segfaults on POCL (still unsure of exact cause but possibly
related to the shuffle functions in combination with these OpenCL drivers).
- Remove 'bug' comments (these are resolved now).
- Added implementation of 'argon2_hash_block()' for non-warped (CPU) case.
- Introduced 'LBLOCKSIZE' for the size of an argon2 block per thread in u64.
Most of the code should now be able to support any warp/wavefront size.
- Replaced inline asm in hc_byte_perm() with __builtin_amdgcn_perm()
- Replaced inline asm in hc_bytealign() with __builtin_amdgcn_alignbyte()
- Defined HC_INLINE as default for HIP, significantly boosting kernel performance of pure kernels
- Removed IS_ROCM from inc_vendor.h as it's no longer needed
- Removed backend-specific code from several hash-modes and inc_rp_optimized.cl, as hc_bytealign_S() is now available on all backends