Commit Graph

16 Commits (6716447dfce969ddde42a9abe0681500bee0df48)

Author SHA1 Message Date
Rosen Penev a55d4aa3c9 fix prototypes and old declarations
10 months ago
justpretending b2f14f2f5d Fix some typos
11 months ago
jsteube 6ee2658104 Prefix more macros to avoid collisions in other existing libraries
1 year ago
jsteube f1ff925b6e Prepare rename macros in header files from _MACRO to MACRO
1 year ago
Gabriele Gristina f8ceb8785e CUDA Backend: moved functions to ext_cuda.c/ext_nvrtc.c and includes to ext_cuda.h/ext_nvrtc.h
2 years ago
Jukka Ojanen cdf27a1cb3 Implement async run_cuda_kernel_memset() and run_cuda_kernel_memset32()
3 years ago
Jukka Ojanen a642f7b233 Remove synchronous GPU memory copy functions
3 years ago
Jukka Ojanen 4263cafdcf Add async CUDA memcpy functions: hc_cuMemcpyDtoDAsync(), hc_cuMemcpyDtoHAsync() and hc_cuMemcpyHtoDAsync(). Implement partially async CUDA memset and bzero kernels.
3 years ago
Jens Steube 66ae5125ce Cache cubin instead of PTX to decrease startup time
4 years ago
Jens Steube 33028314f0 Add hc_cuCtxSetCacheConfig()
5 years ago
Jens Steube ec9925f3b1 Warnings self-check and autotune with CUDA
5 years ago
Jens Steube a6fa7a2749 Add support for some first CUDA module loader
5 years ago
Jens Steube 4b986de5fb Prepare native CUDA hybrid integration
5 years ago
jsteube 378258d789 Fix caching system for use with AMD and NV, drop BINARY_KERNEL define
9 years ago
jsteube 968265fffb - Prepared for JIT use of hash-mode 1500, 8900 and 9300, works already on OpenCL (AMD)
9 years ago
Jens Steube 5065474b4e Initial commit
9 years ago