mirror of
https://github.com/hashcat/hashcat.git
synced 2025-07-07 15:18:15 +00:00

- Integrated occupancy hints from vendor APIs (CUDA, HIP) to set a dynamic threads-per-block limit per kernel instead of using static values. - Added `find_tuning_function()` to identify the relevant kernel. - Autotuner now runs in three stages: threads -> loops -> accel. The first two stages now stop increasing when the tested kernel runtime gets too close to the target runtime (96ms for `-w 3`), leaving headroom for the next stage to adjust in a finer sense. - Accel tuning now uses a capped floating-point multiplier instead of powers of two. - Removed workarounds for missing thread autotuning in plugins. - Removed the hardcoded 4GiB host memory limit for accel. Added a cross-platform `get_free_memory()` to check actual free RAM during GPU initialization, preventing underutilization of high-end GPUs like the 4090. If needed, users can still cap memory usage with `-T` or `-n`. - Updated enums for ROCm 6.4.x and CUDA 12.9. - Added code to detect kernel register spilling. That's relevant so we can keep free enough global memory on the runtime for the runtime to handle spills efficiently.
14 lines
272 B
C
14 lines
272 B
C
/**
|
|
* Author......: See docs/credits.txt
|
|
* License.....: MIT
|
|
*/
|
|
|
|
#ifndef HC_AUTOTUNE_H
|
|
#define HC_AUTOTUNE_H
|
|
|
|
int find_tuning_function (hashcat_ctx_t *hashcat_ctx, hc_device_param_t *device_param);
|
|
|
|
HC_API_CALL void *thread_autotune (void *p);
|
|
|
|
#endif // HC_AUTOTUNE_H
|