Rosen Penev
|
a55d4aa3c9
|
fix prototypes and old declarations
Signed-off-by: Rosen Penev <rosenp@gmail.com>
|
9 months ago |
justpretending
|
b2f14f2f5d
|
Fix some typos
|
10 months ago |
jsteube
|
6ee2658104
|
Prefix more macros to avoid collisions in other existing libraries
|
1 year ago |
jsteube
|
f1ff925b6e
|
Prepare rename macros in header files from _MACRO to MACRO
|
1 year ago |
Gabriele Gristina
|
f8ceb8785e
|
CUDA Backend: moved functions to ext_cuda.c/ext_nvrtc.c and includes to ext_cuda.h/ext_nvrtc.h
|
2 years ago |
Jukka Ojanen
|
cdf27a1cb3
|
Implement async run_cuda_kernel_memset() and run_cuda_kernel_memset32()
|
3 years ago |
Jukka Ojanen
|
a642f7b233
|
Remove synchronous GPU memory copy functions
|
3 years ago |
Jukka Ojanen
|
4263cafdcf
|
Add async CUDA memcpy functions: hc_cuMemcpyDtoDAsync(), hc_cuMemcpyDtoHAsync() and hc_cuMemcpyHtoDAsync(). Implement partially async CUDA memset and bzero kernels.
|
3 years ago |
Jens Steube
|
66ae5125ce
|
Cache cubin instead of PTX to decrease startup time
|
4 years ago |
Jens Steube
|
33028314f0
|
Add hc_cuCtxSetCacheConfig()
|
5 years ago |
Jens Steube
|
ec9925f3b1
|
Warnings self-check and autotune with CUDA
|
5 years ago |
Jens Steube
|
a6fa7a2749
|
Add support for some first CUDA module loader
|
5 years ago |
Jens Steube
|
4b986de5fb
|
Prepare native CUDA hybrid integration
|
5 years ago |
jsteube
|
378258d789
|
Fix caching system for use with AMD and NV, drop BINARY_KERNEL define
|
9 years ago |
jsteube
|
968265fffb
|
- Prepared for JIT use of hash-mode 1500, 8900 and 9300, works already on OpenCL (AMD)
- Changed PROMPT
|
9 years ago |
Jens Steube
|
5065474b4e
|
Initial commit
|
9 years ago |