hashcat/autotune.h at c275c35cedd9817e237652c06af48cdab46a9a8f - hashcat - Gitea: Git with a cup of tea

arno/hashcat

mirror of https://github.com/hashcat/hashcat.git synced 2025-07-07 15:18:15 +00:00

Jens Steube 69a585fa4a Autotune refactoring II: dynamic threads-per-block

- Integrated occupancy hints from vendor APIs (CUDA, HIP) to set a
  dynamic threads-per-block limit per kernel instead of using static
  values.
- Added `find_tuning_function()` to identify the relevant kernel.
- Autotuner now runs in three stages: threads -> loops -> accel. The
  first two stages now stop increasing when the tested kernel runtime
  gets too close to the target runtime (96ms for `-w 3`), leaving
  headroom for the next stage to adjust in a finer sense.
- Accel tuning now uses a capped floating-point multiplier instead of
  powers of two.
- Removed workarounds for missing thread autotuning in plugins.
- Removed the hardcoded 4GiB host memory limit for accel. Added a
  cross-platform `get_free_memory()` to check actual free RAM during GPU
  initialization, preventing underutilization of high-end GPUs like the
  4090. If needed, users can still cap memory usage with `-T` or `-n`.
- Updated enums for ROCm 6.4.x and CUDA 12.9.
- Added code to detect kernel register spilling. That's relevant so we
  can keep free enough global memory on the runtime for the runtime to
  handle spills efficiently.

2025-06-24 20:19:42 +02:00

14 lines

272 B

C

Raw Blame History

 /**
  * Author......: See docs/credits.txt
  * License.....: MIT
  */
 #ifndef HC_AUTOTUNE_H
 #define HC_AUTOTUNE_H
 int find_tuning_function (hashcat_ctx_t *hashcat_ctx, hc_device_param_t *device_param);
 HC_API_CALL void *thread_autotune (void *p);
 #endif // HC_AUTOTUNE_H