## OpenCL Backend
- added hc_clCreateBuffer wrapper, hc_clCreateBuffer_pre
- updated HC_OCL_CREATEBUFFER macro
- updated other two hc_clEnqueueWriteBuffer from CL_FALSE to CL_TRUE
## Metal Backend
- added hc_mtlFinish
- updated hc_mtlCreateBuffer
## Memory
- added hc_alloc_aligned and hc_free_aligned
- renamed hcmalloc_aligned and hcfree_aligned to hcmalloc_bridge_aligned and hcfree_bridge_aligned
## Backend & Bridge
- updated references of hcmalloc_aligned and hcfree_aligned to the new memory defined functions
- solved TODOs in hc_fstat()
- fix memory leaks on Metal Backend
- using HC_OCL_CREATEBUFFER macro for buffer allocation and openclMemoryFlags array to configure the memory flags with OpenCL
- convert lasts CL_FALSE to CL_TRUE in hc_clEnqueueWriteBuffer() calls
- hide pyenv stderr on test_edge.sh
- do not allow --slow-candidates (-S) in benchmark mode
Until now, support for Metal has been written using an Apple M1 and a 10-year-old Apple Intel as a basis to verify that the changes were also compatible with very old devices.
Recently, the code has been tested on an Apple M4 Pro, with a performance increase of about 3,7 times compared to the M1.
The code has also been sporadically tested on an Apple device with a discrete AMD GPU, but the performance was very low.
With this patch, I revisited memory management on Metal, initially creating an easily configurable array mapped 1 to 1 with the buffers allocated by hashcat.
The configuration refers to the Storage Mode associated with the buffers, as well as an ad hoc modification that transforms buffers
with a SHARED Storage Mode to MANAGED if the device is a discrete GPU and not an M* (Silicon) or integrated (Intel) one.
The result was excellent, as some very quick tests showed, for example, argon2 going from 10 H/s (1330.58ms) to 465 H/s (57.26ms)!
A 4550% increase in computational power and a 2223% increase in execution timing on the GPU!
In addition to the array for configuring the buffer storage modes, a macro, HC_MTL_CREATEBUFFER, has also been created.
This is used to generate the code that calls hc_mtlCreateBuffer, making the code much more readable than before.
In summary, this patch lays the groundwork for further improvements to the hashcat core, both on Metal itself and also for other runtimes, particularly OpenCL.
in the following functions I changed the type for the parameter used to specify the target of the operation:
- hc_clReleaseMemObject
- hc_clReleaseKernel
- hc_clReleaseProgram
- hc_cuModuleUnload
- hc_cuMemFree
- hc_cuStreamDestroy
- hc_cuEventDestroy
- hc_hipEventDestroy
- hc_hipMemFree
- hc_hipModuleUnload
- hc_hipStreamDestroy
- hc_mtlReleaseMemObject
- hc_mtlReleaseFunction
- hc_mtlReleaseLibrary
With this change, it was possible to remove several lines of code from backend.c, making it more readable.
- added support to 2D/3D Compute
- improved compute workloads calculation
Makefile:
- updated MACOSX_DEPLOYMENT_TARGET to 15.0
Unit tests:
- updated install_modules.sh with Crypt::Argon2
Argon2 start works with Apple Metal