Commit Graph

16 Commits (master)

Author SHA1 Message Date
Rosen Penev a55d4aa3c9 fix prototypes and old declarations
9 months ago
justpretending b2f14f2f5d Fix some typos
10 months ago
jsteube 6ee2658104 Prefix more macros to avoid collisions in other existing libraries
1 year ago
jsteube f1ff925b6e Prepare rename macros in header files from _MACRO to MACRO
1 year ago
Gabriele Gristina f8ceb8785e CUDA Backend: moved functions to ext_cuda.c/ext_nvrtc.c and includes to ext_cuda.h/ext_nvrtc.h
2 years ago
Jukka Ojanen cdf27a1cb3 Implement async run_cuda_kernel_memset() and run_cuda_kernel_memset32()
3 years ago
Jukka Ojanen a642f7b233 Remove synchronous GPU memory copy functions
3 years ago
Jukka Ojanen 4263cafdcf Add async CUDA memcpy functions: hc_cuMemcpyDtoDAsync(), hc_cuMemcpyDtoHAsync() and hc_cuMemcpyHtoDAsync(). Implement partially async CUDA memset and bzero kernels.
3 years ago
Jens Steube 66ae5125ce Cache cubin instead of PTX to decrease startup time
4 years ago
Jens Steube 33028314f0 Add hc_cuCtxSetCacheConfig()
5 years ago
Jens Steube ec9925f3b1 Warnings self-check and autotune with CUDA
5 years ago
Jens Steube a6fa7a2749 Add support for some first CUDA module loader
5 years ago
Jens Steube 4b986de5fb Prepare native CUDA hybrid integration
5 years ago
jsteube 378258d789 Fix caching system for use with AMD and NV, drop BINARY_KERNEL define
9 years ago
jsteube 968265fffb - Prepared for JIT use of hash-mode 1500, 8900 and 9300, works already on OpenCL (AMD)
9 years ago
Jens Steube 5065474b4e Initial commit
9 years ago