## OpenCL Backend
- added hc_clCreateBuffer wrapper, hc_clCreateBuffer_pre
- updated HC_OCL_CREATEBUFFER macro
- updated other two hc_clEnqueueWriteBuffer from CL_FALSE to CL_TRUE
## Metal Backend
- added hc_mtlFinish
- updated hc_mtlCreateBuffer
## Memory
- added hc_alloc_aligned and hc_free_aligned
- renamed hcmalloc_aligned and hcfree_aligned to hcmalloc_bridge_aligned and hcfree_bridge_aligned
## Backend & Bridge
- updated references of hcmalloc_aligned and hcfree_aligned to the new memory defined functions
- Added OS detection to allow conditional execution of platform-specific initialization steps
- Implemented macOS-specific cache cleaning to ensure consistent benchmark results
- Updated script logic to align with other test suites, including improved handling of workdir setup and execution timing measurements
- solved TODOs in hc_fstat()
- fix memory leaks on Metal Backend
- using HC_OCL_CREATEBUFFER macro for buffer allocation and openclMemoryFlags array to configure the memory flags with OpenCL
- convert lasts CL_FALSE to CL_TRUE in hc_clEnqueueWriteBuffer() calls
- hide pyenv stderr on test_edge.sh
- do not allow --slow-candidates (-S) in benchmark mode
Until now, support for Metal has been written using an Apple M1 and a 10-year-old Apple Intel as a basis to verify that the changes were also compatible with very old devices.
Recently, the code has been tested on an Apple M4 Pro, with a performance increase of about 3,7 times compared to the M1.
The code has also been sporadically tested on an Apple device with a discrete AMD GPU, but the performance was very low.
With this patch, I revisited memory management on Metal, initially creating an easily configurable array mapped 1 to 1 with the buffers allocated by hashcat.
The configuration refers to the Storage Mode associated with the buffers, as well as an ad hoc modification that transforms buffers
with a SHARED Storage Mode to MANAGED if the device is a discrete GPU and not an M* (Silicon) or integrated (Intel) one.
The result was excellent, as some very quick tests showed, for example, argon2 going from 10 H/s (1330.58ms) to 465 H/s (57.26ms)!
A 4550% increase in computational power and a 2223% increase in execution timing on the GPU!
In addition to the array for configuring the buffer storage modes, a macro, HC_MTL_CREATEBUFFER, has also been created.
This is used to generate the code that calls hc_mtlCreateBuffer, making the code much more readable than before.
In summary, this patch lays the groundwork for further improvements to the hashcat core, both on Metal itself and also for other runtimes, particularly OpenCL.
Fix deprecation warning on m30906.pm
Fix pipeline error with -m 32600 on Apple
Update edge_test.sh
Fix edge test vectors generation for hash-type 28501, 28502, 28503, 28504, 28505, 28506
Fixed old/critical bug on Apple Intel with Metal by patching inc_rp_optimized.cl.
Tested on Apple Intel and Silicon with Metal/OpenCL and on Linux with CUDA, HIP, OpenCL GPU/CPU
Metal Backend: parallelize pipeline state object (PSO) compilation internally
Set unexported setting, setShouldMaximizeConcurrentCompilation, to boost kernel build process on Apple Metal (only >= 3)
Hashcat is evolving, both in its core and in the supported algorithms.
To uncover bugs in the code, I implemented edge case testing to verify the settings defined in the specific algorithm test modules (e.g., m00000.pm), as well as the behavior of the kernels (pure and optimized) in relation to the different attack modes (-a0, -a1, etc.).