1
0
mirror of https://github.com/hashcat/hashcat.git synced 2025-07-07 15:18:15 +00:00
Commit Graph

52 Commits

Author SHA1 Message Date
Jens Steube
ed10e6a913 Autotune and Benchmark refactoring
This change affects three key areas, each improving autotuning:

- Autotune refactoring itself

The main autotune algorithm had become too complex to maintain and has
now been rewritten from scratch. The engine is now closer to the old
v6.0.0 version, using a much more straightforward approach.

Additionally, the backend is now informed when the autotune engine runs
its operations and runs an extra invisible kernel invocation. This
significantly improves runtime accuracy because the same caching
mechanisms which kick in normal cracking sessions now also apply during
autotuning. This leads to more consistent and reliable automatic
workload tuning.

- Benchmarking and '--speed-only' accuracy bugs fixed

Benchmark runtimes had become too short, especially since the default
benchmark mask changed from '?b?b?b?b?b?b?b' to '?a?a?a?a?a?a?a?a'. For
very fast hashes like NTLM, benchmarks often stopped immediately when
base words needed to be regenerated, producing highly inaccurate
results.

This issue also misled users tuning '-n' values, as manually
oversubscribing kernels could mask the problem, creating the impression
that increasing '-n' had a larger impact on performance than it truly
does. While '-n' still has an effect, it’s not as significant. With this
fix, users achieve the same speed without needing to tune '-n' manually.

The bug was fixed by enforcing a minimum benchmark runtime of 4 seconds,
regardless of kernel runtime or kernel type. This ensures more stable
and realistic benchmark results, but typically increasing the benchmark
duration by up to 4 seconds.

- Kernel-Threads set to 32 and plugin configuration cleanup

Some plugin configurations existed solely to work around the old
benchmarking bug and can now be removed. For example,
'OPTS_TYPE_MAXIMUM_THREADS' is no longer required and has been removed
from all plugins, although the parameter itself remains to avoid
breaking custom plugins.

Because increasing threads beyond 32 no longer offers meaningful
performance gains, the default is now capped at 32 (unless overridden
with '-T'). This simplifies GPU memory management. Currently, work-item
counts are indirectly limited by buffer sizes (e.g., 'pws_buf[]'), which
must not exceed 4 GiB (a hard-coded limit). This buffer size depends on
the product of 'kernel-accel', 'kernel-threads', and the device’s
compute units. By reducing the default threads from 1024 to 32, there is
now more space available for base words.
2025-06-22 20:17:52 +02:00
Jens Steube
b02b1b5033 - Add code to recognize Microsofts OpenCL D3D12 platform
- Skip memory-free detection on MS OpenCL platform to avoid crashes
- Improve salt usage of 70100/70200, use decoder/kernels from 8900
- Add REPLACE bridge type support (eg. BRIDGE_TYPE_REPLACE_LOOP)
- Switch 70000, 70100 and 70200 to BRIDGE_TYPE_REPLACE_LOOP
- Add synchronization barriers on d2h copy when using bridges
- Improve speed status display updates when using bridges
- Set AMD_DIRECT_DISPATCH=0 to reduce CPU burning loop on AMD backends
- Set benchmark/selftest hash on 70100/70200 to 16:8:1
2025-06-02 06:59:36 +02:00
Jens Steube
ceb5ff5641 The Assimilation Bridge (Framework) 2025-05-29 15:38:13 +02:00
Rosen Penev
419b693f9d fix stdcall warnings under clang32/64
Signed-off-by: Rosen Penev <rosenp@gmail.com>
2023-08-09 17:26:34 -07:00
jsteube
6785a32e7a Added new feature (-Y) that creates N virtual instances for each device in your system at the cost of N times the device memory consumption 2023-04-16 12:04:34 +00:00
Jens Steube
b1ca2ca539 Only try to allocate memory on a opencl device if it actually has memory 2023-04-11 10:17:32 +00:00
jsteube
6ee2658104 Prefix more macros to avoid collisions in other existing libraries 2023-01-30 14:41:12 +00:00
jsteube
f1ff925b6e Prepare rename macros in header files from _MACRO to MACRO 2023-01-17 19:25:40 +00:00
Will Crozier
edf7365cda stdout mode: transfer indexes and compressed pw buffer from device in blocks, reducing overhead 2022-02-25 16:35:55 +00:00
Gabriele Gristina
490702fcfa Backends: added Metal host-code 2022-02-05 22:48:16 +01:00
Gabriele Gristina
f8ceb8785e CUDA Backend: moved functions to ext_cuda.c/ext_nvrtc.c and includes to ext_cuda.h/ext_nvrtc.h 2022-01-03 16:29:15 +01:00
Gabriele Gristina
78c7ee2af6 HIP Backend: moved functions to ext_hip.c/ext_hiprtc.c and includes to ext_hip.h/ext_hiprtc.h 2022-01-02 19:12:41 +01:00
Gabriele Gristina
26b6054cab OpenCL Backend: moved functions to ext_OpenCL.c and includes to ext_OpenCL.h 2021-12-28 21:55:05 +01:00
Jukka Ojanen
f3cd32bffd Avoid memory copy in rebuild_pws_compressed_append() and assign instead 2021-08-18 17:46:39 +03:00
Jens Steube
cb69e2d413 Added some HIP version checks, fall back to OpenCL automatically
Switched HIP version check from driverVersion to runtimeVersion
Fixed syntax check of HAS_VPERM macro in several kernel includes causing invalid error message for AMD GPUs on Windows
Updated AMD driver requirements
Updated docs/changes.txt with missing changes from previous commits
Fixed invalid vector data type in Murmur Hash in -a 3 mode
Fixed uninitialized variable warning in src/hashes.c
Fixed broken support for --generate-rules-func-min
2021-08-04 20:49:22 +02:00
Jukka Ojanen
c3195d0603 Merge branch 'master' of https://github.com/hashcat/hashcat 2021-07-30 11:34:25 +03:00
Jukka Ojanen
cdf27a1cb3 Implement async run_cuda_kernel_memset() and run_cuda_kernel_memset32() 2021-07-27 18:56:59 +03:00
Jukka Ojanen
a642f7b233 Remove synchronous GPU memory copy functions 2021-07-26 15:36:42 +03:00
Jukka Ojanen
4bd7363674 Add missing HIP declarations 2021-07-25 22:18:16 +03:00
Jens Steube
84a4058edf
Merge pull request #2900 from hashcat/master
Backport changes
2021-07-25 10:34:05 +02:00
Jens Steube
640d95a00f Vendor Detection: Add "Intel" as a valid vendor name for GPU on macOS 2021-07-24 16:01:30 +02:00
Jens Steube
5ffcaa980d HIP Backend: Added support to support HIP 4.4 and later, but added check to rule out older versions because they are incompatible 2021-07-23 16:04:34 +02:00
Jukka Ojanen
4263cafdcf Add async CUDA memcpy functions: hc_cuMemcpyDtoDAsync(), hc_cuMemcpyDtoHAsync() and hc_cuMemcpyHtoDAsync(). Implement partially async CUDA memset and bzero kernels. 2021-07-20 12:23:39 +03:00
Jens Steube
ca3beacd93 Disable dynamic shared memory on HIP, because hipFuncSetAttribute() maps to cudaFuncSetAttribute() and not to cuFuncSetAttribute() 2021-07-11 14:30:49 +02:00
Jens Steube
1b84a9e53b Add missing backports from code base v6.2.2
Fix context to thread management
Fix missing code in selftest.c, autotune.c, hashes.c, dispatch.c and backend.c
Use IS_HIP depending code makes it easier for future optimization related to inline assembly calls - instead of using IS_CUDA || IS_HIP
See TODO markers for more optimizations / next steps
2021-07-11 12:38:59 +02:00
Jens Steube
a22f8149fc
Merge branch 'HIP' into hip 2021-07-10 21:34:09 +02:00
reger-men
ea7b74389f First draft HIP Version 2021-07-09 03:50:40 +00:00
nycex
470e844e5d
use XDG_CACHE_HOME for kernels 2021-06-05 00:38:43 +02:00
Jens Steube
5c6501444a Kernels: Add standalone true UTF8 to UTF16 converter kernel that runs after amplifier. Use OPTS_TYPE_POST_AMP_UTF16LE from plugin 2021-05-20 14:34:24 +02:00
Jens Steube
0c2afde83b Add support for clUnloadPlatformCompiler() 2021-05-02 08:15:25 +00:00
Jens Steube
a0eae9050c OpenCL Runtime: Workaround JiT compiler deadlock on NVIDIA driver >= 465.89 2021-04-11 13:35:40 +02:00
Jens Steube
04d5e5a119 New Attack-Mode: Association Attack. Like JtR's single mode. Very early
stage. See hashcat Forum for detailed writeup.
2020-09-29 15:56:32 +02:00
Jens Steube
1fc37c25f9 OpenCL Kernels: Moved "gpu_decompress", "gpu_memset" and "gpu_atinit" into new OpenCL/shared.cl in order to reduce compile time 2020-02-01 09:00:48 +01:00
Jens Steube
346637ec43 Improve cujit logging 2020-01-30 11:44:57 +01:00
Jens Steube
66ae5125ce Cache cubin instead of PTX to decrease startup time 2020-01-29 15:56:36 +01:00
Jens Steube
a8555fa048 Support use of all available CPU cores for hash-mode specific hooks 2019-11-03 12:05:52 +01:00
Rosen Penev
6ecb42b3ea
Run through Clang's readability-inconsistent-declaration-parameter-name 2019-08-03 22:37:38 -07:00
Jens Steube
5e0eb288c9 Use __launch_bounds__ in CUDA as replacement for reqd_work_group_size() in OpenCL 2019-06-16 18:01:26 +02:00
Jens Steube
0568c0746a Emulate effect of reqd_work_group_size() in CUDA 2019-06-06 17:49:41 +02:00
Jens Steube
44ecc83d82 Do some CUDA and NVRTC version checks on startup 2019-06-05 10:53:48 +02:00
Jens Steube
1f6c82b6d1 Add hc_cuModuleLoadDataExLog wrapper function for more detailed error logging from CUDA 2019-06-01 07:47:30 +02:00
Jens Steube
ce8a6fde0a Fix status screen current password query 2019-05-14 15:25:36 +02:00
Jens Steube
33028314f0 Add hc_cuCtxSetCacheConfig() 2019-05-09 00:04:05 +02:00
Jens Steube
4df00033d7 Prepare CUDA events 2019-05-04 10:44:03 +02:00
Jens Steube
f2948460c9 Some first kernel invocations 2019-05-04 10:13:43 +02:00
Jens Steube
5ee033673c Disable name mangling in NVRTC's PTX output and more 2019-05-03 15:50:07 +02:00
Jens Steube
af8e317cf4 Begin renaming some OpenCL only variables 2019-05-02 17:12:59 +02:00
Jens Steube
a6fa7a2749 Add support for some first CUDA module loader 2019-05-02 14:58:52 +02:00
Jens Steube
d73c0ac8a9 More CUDA attribute queries 2019-04-28 18:54:26 +02:00
Jens Steube
a415422123 Initialize CUDA devices and some first attribute queries 2019-04-28 14:45:50 +02:00