Gabriele Gristina
b3d3b31c3e
Metal: added support for vectors up to 4
2022-02-10 21:53:08 +01:00
Gabriele Gristina
9d36245d51
Kernels: Set the default Address Space Qualifier for any pointer, refactored / updated KERN_ATTR macros and rc4 cipher functions, in order to support Apple Metal runtime
2022-02-04 19:54:00 +01:00
Jens Steube
3f6c5a0042
Update module_unstable_warning() for -m 172xx on HIP
2021-07-23 21:09:55 +02:00
Jens Steube
5ffcaa980d
HIP Backend: Added support to support HIP 4.4 and later, but added check to rule out older versions because they are incompatible
2021-07-23 16:04:34 +02:00
Jens Steube
bdb7999f07
Switch HIP vector datatypes to OpenCL like ext_vector_type()
2021-07-19 20:24:30 +02:00
Jens Steube
0d8b4b74ad
More CUDA special backports to HIP
2021-07-18 22:56:22 +02:00
Jens Steube
257098a301
Get rid of hip/hip_runtime.h dependancy
2021-07-18 21:14:45 +02:00
Jens Steube
45e65dd05a
Backport more ROCm based optimizations to HIP
2021-07-15 23:34:27 +02:00
Jens Steube
d130cc66b3
Optimize ISA code on HIP for V_ALIGNBIT_B32 using a different template for inline assembly
2021-07-15 09:57:41 +02:00
Jens Steube
674ca7d88f
Add GPU threads to kernel cache checksum because it has an influence on HIP offline compile options
...
Add V_ALIGNBIT_B32 inline assembly wrapper because HIP does not provide amd_bitalign()
2021-07-12 11:27:05 +02:00
Jens Steube
20f7febd4c
Workaround too intensive optimization in -m 2000 using HIPRTC
2021-07-11 15:54:13 +02:00
Jens Steube
1b84a9e53b
Add missing backports from code base v6.2.2
...
Fix context to thread management
Fix missing code in selftest.c, autotune.c, hashes.c, dispatch.c and backend.c
Use IS_HIP depending code makes it easier for future optimization related to inline assembly calls - instead of using IS_CUDA || IS_HIP
See TODO markers for more optimizations / next steps
2021-07-11 12:38:59 +02:00
Jens Steube
a22f8149fc
Merge branch 'HIP' into hip
2021-07-10 21:34:09 +02:00
reger-men
ea7b74389f
First draft HIP Version
2021-07-09 03:50:40 +00:00
Jens Steube
62fc3601bb
Wrap atomic functions with hc_ prefix to have better platform control
2021-04-20 17:47:44 +02:00
Jens Steube
73cc3170f4
Fixed both false negative and false positive result in -m 3000 in -a 3 (affected only NVIDIA GPU)
2021-04-20 17:14:13 +02:00
Jens Steube
316095c151
Some more ROCm performance tuning
2019-06-20 10:04:31 +02:00
Jens Steube
5e0eb288c9
Use __launch_bounds__ in CUDA as replacement for reqd_work_group_size() in OpenCL
2019-06-16 18:01:26 +02:00
Jens Steube
7832c54452
Fix constant memory use of bfs_buf
2019-05-11 09:32:16 +02:00
Jens Steube
46f737c5af
Use real constant memory on CUDA
2019-05-10 13:22:26 +02:00
Jens Steube
d0bd33c9d1
Rename CONSTANT_AS to CONSTANT_VK
2019-05-06 14:34:16 +02:00
Jens Steube
ec9925f3b1
Warnings self-check and autotune with CUDA
2019-05-04 21:52:00 +02:00
Jens Steube
3b7304c9d8
Fix recursion in inc_platform.cl
2019-04-26 14:01:14 +02:00
Jens Steube
89119bf24a
Add missing inc_platform.h include
2019-04-26 13:59:43 +02:00
Jens Steube
9faba41848
Use nvrtc to compile PTX (resulting PTX not yet used)
2019-04-26 13:28:44 +02:00
Jens Steube
4b986de5fb
Prepare native CUDA hybrid integration
2019-04-25 14:45:17 +02:00