In the automatic downtune routine, hashcat prepares a fixed 512MiB host
buffer that is known to be allocated by the compute runtimes (CUDA, HIP,
OpenCL, Metal), and over which hashcat has no control.
However, hashcat still divides the maximum available host memory by the
active device count to automatically as a preparation to later downtune
the -n and -T parameters when memory is limited.
Hashcat reserves 512MiB per active device. With bridges, the active
devices become bridge units, which for modes 70000, 70100, and 70200
equals the CPU core count. On a 32-core CPU, this multiplies to 16GiB,
even though the memory is actually shared because of threading.
This leads to an overestimation of memory usage.
A simple fix is to divide the 512MiB buffer by the active device count.
This keeps the full 512MiB for a single GPU but avoids overestimating
memory usage with many virtual devices.
Fix code handling kernel-accel value in argon2_common.c for CPU,
which was accidentally removed during previous refactoring.
Set thread count to 1 for hash-mode 70000. Oversubscribing the CPU isn't
useful here. This allows to keep the wordlist count low, which is very
welcome for slow hashes like Argon2id.
Fix unit test for 20011/20012/20013 (DiskCryptor) by adding setuptools
to install_modules.sh and replacing AES.MODE_XTS with python_AES.MODE_XTS.
Fix false negative for kernel 32800, which only occurred if all
conditions were true: multihash, -a 3, optimized mode, and password
length between 16 and 31.
Fix Python package name in BUILD_WSL.md command line example.