Jukka Ojanen
|
4263cafdcf
|
Add async CUDA memcpy functions: hc_cuMemcpyDtoDAsync(), hc_cuMemcpyDtoHAsync() and hc_cuMemcpyHtoDAsync(). Implement partially async CUDA memset and bzero kernels.
|
2021-07-20 12:23:39 +03:00 |
|
Jens Steube
|
66ae5125ce
|
Cache cubin instead of PTX to decrease startup time
|
2020-01-29 15:56:36 +01:00 |
|
Jens Steube
|
33028314f0
|
Add hc_cuCtxSetCacheConfig()
|
2019-05-09 00:04:05 +02:00 |
|
Jens Steube
|
ec9925f3b1
|
Warnings self-check and autotune with CUDA
|
2019-05-04 21:52:00 +02:00 |
|
Jens Steube
|
a6fa7a2749
|
Add support for some first CUDA module loader
|
2019-05-02 14:58:52 +02:00 |
|
Jens Steube
|
4b986de5fb
|
Prepare native CUDA hybrid integration
|
2019-04-25 14:45:17 +02:00 |
|
jsteube
|
378258d789
|
Fix caching system for use with AMD and NV, drop BINARY_KERNEL define
|
2015-12-21 12:01:38 +01:00 |
|
jsteube
|
968265fffb
|
- Prepared for JIT use of hash-mode 1500, 8900 and 9300, works already on OpenCL (AMD)
- Changed PROMPT
|
2015-12-07 21:37:12 +01:00 |
|
Jens Steube
|
5065474b4e
|
Initial commit
|
2015-12-04 15:47:52 +01:00 |
|