- solved TODOs in hc_fstat()
- fix memory leaks on Metal Backend
- using HC_OCL_CREATEBUFFER macro for buffer allocation and openclMemoryFlags array to configure the memory flags with OpenCL
- convert lasts CL_FALSE to CL_TRUE in hc_clEnqueueWriteBuffer() calls
- hide pyenv stderr on test_edge.sh
- do not allow --slow-candidates (-S) in benchmark mode
in the following functions I changed the type for the parameter used to specify the target of the operation:
- hc_clReleaseMemObject
- hc_clReleaseKernel
- hc_clReleaseProgram
- hc_cuModuleUnload
- hc_cuMemFree
- hc_cuStreamDestroy
- hc_cuEventDestroy
- hc_hipEventDestroy
- hc_hipMemFree
- hc_hipModuleUnload
- hc_hipStreamDestroy
- hc_mtlReleaseMemObject
- hc_mtlReleaseFunction
- hc_mtlReleaseLibrary
With this change, it was possible to remove several lines of code from backend.c, making it more readable.
Fixed a typedef issue for clEnqueueReadBuffer().
Updated Python/hcshared.py with missing entry for new salt_dimy attribute in salt_t struct.
Fixed a bug in the autotuner when determining the starting value for kernel loops, in cases where the iteration count is N-1 and not a multiple of 1024.
Updated additional plugins to use OPTI_TYPE_REGISTER_LIMIT.
- Some performance on low-end GPU may drop because of that, but only for a few hash-modes
- Dropped scalar code (aka warp) since we do not have any vector datatypes anymore
- Renamed C++ overloading functions memcat32_9 -> memcat_c32_w4x4_a3x4
- Still need to fix kernels to new function names, needs to be done manually
- Temperature Management needs to be rewritten partially because of conflicting datatypes names
- Added code to create different codepaths for NV on AMD in runtime in host (see data.vendor_id)
- Added code to create different codepaths for NV on AMD in runtime in kernels (see IS_NV and IS_AMD)
- First tests working for -m 0, for example
- Great performance increases in general for NV so far
- Tested amp_* and markov_* kernel
- Migrated special NV optimizations for rule processor