Commit Graph

131 Commits (v6.1.0)

Author SHA1 Message Date
Jens Steube 0ff2f8c5e1 OpenCL Devices: Utilize PCI domain to improve alias device detection
4 years ago
philsmd 3e822e97b9
fixes #2460: better alias detection esp. for macOS
4 years ago
Jens Steube 5628317de8 OpenCL Runtime: Reinterpret return code CL_DEVICE_NOT_FOUND from clGetDeviceIDs() as non-fatal
4 years ago
philsmd e59f61e8cf
cosmetic: minor code style fixes
4 years ago
Jens Steube a6a6bb200a Mark NV 441.x as fixed
4 years ago
Jens Steube 1e469a96a4 Add missing branch in automatic alias device selection
4 years ago
Jens Steube 34f71aaea3 Re-enable POCL is version detected is >= 1.5 and LLVM is >= 9.x and also remove performance warning. Still prefers native OpenCL runtime in alias detection, but this default can be overriden using -d parameter.
4 years ago
Matt Palmer 240d35976a Fix build warning in DEBUG mode
4 years ago
Jens Steube 008072eb65 OpenCL Runtime: Added a warning if OpenCL runtime NEO, Beignet, POCL or MESA is detected and skip associated devices (override with --force)
4 years ago
Jens Steube 434ad76381 Improve alias device detection to distinguish between Intel CPU and embedded GPU
4 years ago
Jens Steube ba7163062d Do not set -cl-std=XXX to workaround NEO driver bug causing to hang while compiling -m 22000
4 years ago
Jens Steube 2b2a7ede66 OpenCL Options: Set --spin-damp to 0 (disabled) by default. With the CUDA backend this workaround became deprecated
4 years ago
Jens Steube 8c3808bad5 Fix NUL filename on windows
4 years ago
Jens Steube 3e4d110fd2 Add stderr redirection the regular way
4 years ago
Jens Steube 125e9ec863 Do not redirect stderr to /dev/null to prevent rocm 3.1 from crashing on debian
4 years ago
Jens Steube f381e1bbf8 Remove force_recompile functionality, doesn't work with cubin anymore
4 years ago
Jens Steube f96e35649d Change bitsliced kernels from 3d to 2d invocation mode for slightly better performance
4 years ago
Jens Steube d9473358ef Add support for OPTS_TYPE_LOOP_EXTENDED kernel for special cases like VeraCrypt
4 years ago
Jens Steube c90d83c3eb Prepare for UNROLL whitelisting
4 years ago
Jens Steube 4788c61dd2 Add OPTI_TYPE_REGISTER_LIMIT flag to enable register limiting in CUDA
4 years ago
Jens Steube 17a64f5019 Set a fixed register count maximumfor CUDA kernel. This prevents kernels going out of control and to have negative effects on other kernels from the same source code (For instance 16600)
4 years ago
Jens Steube c40f474c2e Add special module option to indicate the kernel is using dynamic shared memory
4 years ago
Jens Steube fb7bb04587 Do not use dynamic shared memory if dynamic_local_mem_size is a multiple of local_mem_size
4 years ago
Jens Steube 96a2c36f53 Reduce CUDA Toolkit minimum version to 9.0 (even 8.0 should be sufficient)
4 years ago
Jens Steube aef53f7e10 OpenCL Runtime: Allow the kernel to access post-48k shared memory region on CUDA. Requires both module and kernel preparation
4 years ago
Jens Steube 1fc37c25f9 OpenCL Kernels: Moved "gpu_decompress", "gpu_memset" and "gpu_atinit" into new OpenCL/shared.cl in order to reduce compile time
4 years ago
Jens Steube 08163501cf Add option to disable cubin cache binaries and moved some redundant kernel load code into specific function
4 years ago
Jens Steube 01085cdab2 Move cujit_opts allocation closer to the calling functions because CUDA library needs it reinitialized after each use
4 years ago
Jens Steube 346637ec43 Improve cujit logging
4 years ago
Jens Steube 66ae5125ce Cache cubin instead of PTX to decrease startup time
4 years ago
Jens Steube cc4fd48ace Optimize hook buffer size to be copied
4 years ago
Jens Steube 041a777025 OpenCL Runtime: Unlocked maximum thread count for NVIDIA GPU
4 years ago
Jens Steube ccacc508cb Reenabled support for Intel GPU OpenCL runtime (Beignet and NEO) because a workaround was found (force -cl-std=CL2.0)
4 years ago
Jens Steube fe372dffb7 Add RDNA ISA instructions test for ADD/ADDC/SUB/SUBB
4 years ago
Jens Steube df5e2361d3 Disable inline assembly instruction tests for CUDA and refer to documented requirements
4 years ago
Jens Steube d0fb171da9 Added new options --backend-ignore-cuda and --backend-ingore-opencl, to ignore CUDA and/or OpenCL interface from being load on startup
4 years ago
Jens Steube b3690fcd05 Backport instruction test cache from CUDA to OpenCL
4 years ago
Jens Steube 2b4d0656d5 Cache inline assembly instruction check results for same devices types
4 years ago
Jens Steube 5d1d48f5d7 Do not check for COPY_PW limits in outside kernels
4 years ago
Jens Steube 53254b45aa Backport inc_ecc_secp256k1 inline assembly code for AMD ISA
5 years ago
Jens Steube bfd95d42f6 - OpenCL Runtime: Reenabled support for Intel GPU OpenCL runtime
5 years ago
Jens Steube 2884bded32 Initialize some variable to make scan-build happy
5 years ago
Jens Steube 00b9f4c557 Add kernel accel minimum limit check
5 years ago
Jens Steube 424777ae28 Add kernel accel limiter based on kernel threads to reduce host memory requirements
5 years ago
Jens Steube f7c3ced548 Fix use of calloc() in backend.c
5 years ago
Jens Steube c4dd020685 Add support for NVIDIA Jetson AGX Xavier developer kit
5 years ago
Jens Steube 53e96a12a0 Improve automatic calculation of hook threads value
5 years ago
Jens Steube fe8c17f4c7 Support pause/abort in hooks
5 years ago
Jens Steube 9c2c73c6cc Clear hook buffers after full kernel chain is finished
5 years ago
Jens Steube 7458e4f487 Add per-device available memory test of static data (hashlist, ruleset) before test of dynamic data (-n based)
5 years ago