Jens Steube
1b84a9e53b
Add missing backports from code base v6.2.2
...
Fix context to thread management
Fix missing code in selftest.c, autotune.c, hashes.c, dispatch.c and backend.c
Use IS_HIP depending code makes it easier for future optimization related to inline assembly calls - instead of using IS_CUDA || IS_HIP
See TODO markers for more optimizations / next steps
2021-07-11 12:38:59 +02:00
Jens Steube
a22f8149fc
Merge branch 'HIP' into hip
2021-07-10 21:34:09 +02:00
reger-men
ea7b74389f
First draft HIP Version
2021-07-09 03:50:40 +00:00
Jens Steube
bb402b784a
Update module_unstable_warning for benchmark short selection on macOS for CPU and GPU; Allow use of GPU without --force testwise
2021-05-10 14:36:41 +02:00
Jens Steube
62fc3601bb
Wrap atomic functions with hc_ prefix to have better platform control
2021-04-20 17:47:44 +02:00
Jens Steube
316095c151
Some more ROCm performance tuning
2019-06-20 10:04:31 +02:00
Jens Steube
027af75a39
Fix rotate function names
2019-05-08 20:42:46 +02:00
Jens Steube
d0bd33c9d1
Rename CONSTANT_AS to CONSTANT_VK
2019-05-06 14:34:16 +02:00
Jens Steube
9faba41848
Use nvrtc to compile PTX (resulting PTX not yet used)
2019-04-26 13:28:44 +02:00
Jens Steube
4045e60021
Add nvrtc wrapper for later use
2019-04-26 10:03:16 +02:00
Jens Steube
4b986de5fb
Prepare native CUDA hybrid integration
2019-04-25 14:45:17 +02:00