1
0
mirror of https://github.com/hashcat/hashcat.git synced 2025-02-23 21:02:27 +00:00
Commit Graph

345 Commits

Author SHA1 Message Date
Jens Steube
23c3c178bf Limit max threads per block to 64 to enable offline compiler to make better use if available registers
Fix double free() for hip_event1/hip_event2 and hip_stream causes segfault
Replace hc_cuCtxSetCurrent() with hc_cuCtxPushCurrent() in order to align changes with HIP
Add vector datatype operators (if we decide to use them - currently unused)
2021-07-12 09:28:26 +02:00
Jens Steube
ca3beacd93 Disable dynamic shared memory on HIP, because hipFuncSetAttribute() maps to cudaFuncSetAttribute() and not to cuFuncSetAttribute() 2021-07-11 14:30:49 +02:00
Jens Steube
1b84a9e53b Add missing backports from code base v6.2.2
Fix context to thread management
Fix missing code in selftest.c, autotune.c, hashes.c, dispatch.c and backend.c
Use IS_HIP depending code makes it easier for future optimization related to inline assembly calls - instead of using IS_CUDA || IS_HIP
See TODO markers for more optimizations / next steps
2021-07-11 12:38:59 +02:00
Jens Steube
bfe83ec138 Added temperature watchdog for CPU on linux using sysfs 2021-07-10 08:43:15 +02:00
Jens Steube
979f9e9868 Rename hardware monitor interface sysfs to sysfs_amdgpu 2021-07-09 20:48:10 +02:00
Jukka Ojanen
a0eaefa0c2 Missing whitespaces 2021-07-05 20:20:51 +03:00
Jukka Ojanen
e133bd4ec4 Change rc_board_name_amd initial value to CL_INVALID_VALUE. If CHECK_BOARD_NAME_AMD is changed to 0, there is a problem with CL_SUCCESS which equals to 0, device will be skipped. 2021-07-05 19:05:10 +03:00
Jukka Ojanen
b3d18f86e2 Fix early return leaks in backend_session_begin 2021-07-05 19:03:56 +03:00
Jukka Ojanen
b976e52bc7 Fix early return leaks in load_kernel; nvrtc_options, nvrtc_options_string, build_log. Ensure build log NULL termination. 2021-07-05 19:00:35 +03:00
Jukka Ojanen
2f7eec2fd7 Fix early return leaks in backend_ctx_init and backend_ctx_devices_init 2021-07-05 15:52:48 +03:00
Jukka Ojanen
bcbb9b0d2c Fix skipped device param leak in backend_ctx_devices_destroy 2021-07-05 15:38:07 +03:00
Jukka Ojanen
5f109b5862 Fix iconv_ctx and iconv_tmp leaks in backend.c 2021-07-03 12:51:37 +03:00
Jens Steube
8a3eee3fe5 OpenCL Runtime: Workaround JiT crash (SC failed. No reason given.) on macOS by limiting local memory allocations to 32k 2021-06-27 10:18:38 +02:00
Jens Steube
909d5e64a5 Added hash-mode: AES-128/192/256-ECB NOKDF
This mode is probably very rare in real-life scenarios,
but it is a nice template for kernels which do
not use a KDF,
or use AES,
or simple fast hashes with lookup tables
or simple optimized kernels in general
2021-06-26 17:12:10 +02:00
Jens Steube
28c08de9ef Folders: Do not escape the variable cpath_real to prevent certain OpenCL runtimes from running into an error which do not support escape characters 2021-06-24 09:24:02 +02:00
Jens Steube
90f0e78b5b Add event about autodetection process start
Remove "..." substring from final messages
Add patient message on "..." startup messages
Add missing docs/changes.txt entry
2021-06-22 11:37:59 +02:00
Jens Steube
7e267b9b37
Merge pull request #2825 from matrix/hwmon_osx_v2
Add OSX HW Monitor initial support (2021)
2021-06-20 11:09:35 +02:00
Jens Steube
2d7f67fd23 Alias Devices: Show a warning in case the user specifically listed a device to use which in a later step is skipped because it is an alias of another active device 2021-06-15 14:06:24 +02:00
Gabriele Gristina
bc4ce4cbeb Add support for CPU/GPU device temperature and fanspeed using iokit (Apple) 2021-06-12 20:13:31 +02:00
Jens Steube
ee7fca82f5 Scrypt Kernels: Re-enable scrypt based kernels to use kernel cache 2021-06-12 11:42:19 +02:00
Jens Steube
7fc0ac4ef1 Do not initialize backend devices in case --id is used and some bug fixes 2021-06-12 10:47:48 +02:00
Jens Steube
65d81c0f7b Hashrate: Innerloop hashrate prediction requires update because of the new salt_repeats feature and also respect _loop2 kernel runtime 2021-06-10 22:43:06 +02:00
Jens Steube
fdccc8287d Fixed free memory size output for skipped GPU (both automatic and manual) of --backend-info information screen 2021-06-07 12:12:01 +02:00
Royce Williams
7bea7ca177 tighten output to be physical-terminal friendly 2021-06-06 10:47:18 -08:00
nycex
470e844e5d
use XDG_CACHE_HOME for kernels 2021-06-05 00:38:43 +02:00
Jens Steube
ce8c121b50 BCRYPT Kernels: Improved bcrypt performance by 6.5% for high-end NVIDIA GPU devices using CUDA backend 2021-06-01 22:52:07 +02:00
Jens Steube
9f5a22a3ab OpenCL Backend: Use CL_DEVICE_BOARD_NAME_AMD instead of CL_DEVICE_NAME for device name in case OpenCL runtime supports this query 2021-05-26 19:09:50 +02:00
Jens Steube
5c6501444a Kernels: Add standalone true UTF8 to UTF16 converter kernel that runs after amplifier. Use OPTS_TYPE_POST_AMP_UTF16LE from plugin 2021-05-20 14:34:24 +02:00
Jens Steube
fe91f6276d CUDA Backend: Do not warn about missing CUDA SDK installation if --stdout is used 2021-05-16 18:31:43 +02:00
Jens Steube
a2fcb03fe3 Update AMD GPU check on macOS warning message 2021-05-11 10:54:39 +02:00
Jens Steube
282eb75fe9 Update module_unstable_warning for benchmark long selection on macOS for CPU and GPU; Allow use of GPU without --force testwise 2021-05-10 19:57:56 +02:00
Jens Steube
bbd6e55968 Add missing null pointer to cuda_module_shared and opencl_program_shared in backend cleanup function 2021-05-09 14:55:52 +02:00
Jens Steube
0c2afde83b Add support for clUnloadPlatformCompiler() 2021-05-02 08:15:25 +00:00
Jens Steube
59459d0e5b Fixed memory leak causing problems in sessions with many iterations. for instance, --benchmark-all or large mask files 2021-05-01 17:27:33 +02:00
Jens Steube
9a87d5aa01 Fixed out-of-boundary reads in case user activates -S for fast but pure hashes in -a 1 or -a 3 mode 2021-04-27 19:55:30 +02:00
Jens Steube
0ba77fe761 Kernel Development: Kernel cache is disabled automatically in casehashcat is compiled with DEBUG=1
See https://github.com/hashcat/hashcat/issues/2750
2021-04-26 09:51:50 +02:00
Jens Steube
1dac869cb7 Removed unnecessary swaps in SCRYPT based algorithms 2021-04-23 20:55:13 +02:00
Jens Steube
15f35fa68c Scrypt Kernels: Reduced kernel wait times by making it a true split kernel where iteration count = N value 2021-04-21 15:59:14 +02:00
Jens Steube
8e47fdf8f5 Add 4 times single workitem extra buffer size to total extra buffer size to workaround single workitem buffer overflows 2021-04-19 10:27:51 +02:00
Jens Steube
57a8923b81 Update complete SCRYPT workload tuning logic.
A detailed description will follow.
Set -m 8900 defaults to 16k:8:1 (default scrypt settings).
2021-04-16 20:17:53 +02:00
Jens Steube
ff96015f53 Add OPTS_TYPE_NATIVE_THREADS for use by plugin developer to enforce native thread count (useful for scrypt) 2021-04-14 15:22:30 +02:00
Jens Steube
67d189e10a Update calculation of EXTRA_SPACE in backend.c and add upper and lower hard limit 2021-04-13 12:02:52 +02:00
Jens Steube
51e8661070 Update calculation of EXTRA_SPACE in backend.c to make it depending from kernel-accel 2021-04-13 11:47:37 +02:00
Jens Steube
9033975efd Allow plugins to disable the multiplication of the kernel-accel value with the multiprocessor count of the compute device. Will be used later. 2021-04-13 11:26:17 +02:00
Jens Steube
a0eae9050c OpenCL Runtime: Workaround JiT compiler deadlock on NVIDIA driver >= 465.89 2021-04-11 13:35:40 +02:00
Jens Steube
3c199bfa1b
Merge pull request #2693 from matrix/out_of_host_memory
[backend.c] skipping devices instead of stop with error
2021-04-05 12:58:47 +02:00
Jens Steube
d53913f444
Merge pull request #2672 from matrix/example2info
Replaced --example-hashes with --hash-info
2021-04-03 13:25:51 +02:00
Chick3nman
9b6235a5fc
Downgrade Kernel Exec Timeout Warning
Kernel times of >450ms are very uncommon and this warning is not a blocking, downgrading it to advice to allow for it to be hidden along with other advice messages.
2021-04-01 20:01:44 -05:00
Gabriele Gristina
fda0d668e5 use skip also with first checks of backend_session_begin() 2021-01-23 18:51:25 +01:00
Gabriele Gristina
4c2605f7f2 switch to skip instead return -1 for all checks, moved cuda counter update to the end of loop 2021-01-23 18:37:47 +01:00
Gabriele Gristina
f4dbd46b71 trying skip devices instead of return -1 2021-01-23 13:54:46 +01:00
Gabriele Gristina
77e328d659 Removed option --example-hashes, now is an alias of --hash-info 2020-12-29 07:56:20 +01:00
Gabriele Gristina
3ed1f0d840 Added new option: --hash-info 2020-12-29 04:58:58 +01:00
Jens Steube
04d5e5a119 New Attack-Mode: Association Attack. Like JtR's single mode. Very early
stage. See hashcat Forum for detailed writeup.
2020-09-29 15:56:32 +02:00
Jens Steube
57bef8abc9 Display possible NVIDIA CUDA/RTC library loading error message only in case a NVIDIA device was found using OpenCL 2020-09-06 13:29:32 +02:00
Jens Steube
111f39eeb2 OpenCL Runtime: Switched default OpenCL device type on macOS from GPU to CPU. Use -D 2 to enable GPU devices. 2020-09-05 23:20:59 +02:00
Jens Steube
343d3bc0aa CUDA Backend: Give detailed warning if either the NVIDIA CUDA or the NVIDIA RTC library cannot be initialized
CUDA Backend: Do not warn about missing CUDA SDK installation if --backend-ignore-cuda is used
2020-09-05 15:46:28 +02:00
Jens Steube
6a419d068c CUDA Backend: Use blocking events to avoid 100% CPU core usage (per GPU) 2020-08-31 13:35:57 +02:00
Jens Steube
62a7ae4075 Increase EXTRA_SPACE to leave some room for free device memory 2020-08-29 20:01:39 +02:00
Jens Steube
3ebf4c5f9f Merge branch 'master' of https://github.com/hashcat/hashcat 2020-08-29 16:12:37 +02:00
Jens Steube
98aef2ae92 Module Structure: Add 3rd party library hook management functions. This also requires an update to all existing module_init() 2020-08-29 16:12:15 +02:00
philsmd
ee5bce1c3e
fixes #2518: call clear_prompt () more often to avoid misaligned prompt 2020-08-26 12:14:26 +02:00
Jens Steube
a72ba6faab Add OPTI_TYPE_SLOW_HASH_SIMD_INIT2 and OPTI_TYPE_SLOW_HASH_SIMD_LOOP2 2020-08-14 15:52:36 +02:00
Jens Steube
e21463da4b Fixed race condition resulting in out of memory error on startup if multiple hashcat instances are started at the same time 2020-08-14 09:04:52 +02:00
Jens Steube
0ff2f8c5e1 OpenCL Devices: Utilize PCI domain to improve alias device detection 2020-07-27 15:21:56 +02:00
philsmd
3e822e97b9
fixes #2460: better alias detection esp. for macOS 2020-07-02 17:39:22 +02:00
Jens Steube
5628317de8 OpenCL Runtime: Reinterpret return code CL_DEVICE_NOT_FOUND from clGetDeviceIDs() as non-fatal 2020-06-19 13:15:31 +02:00
philsmd
e59f61e8cf
cosmetic: minor code style fixes 2020-06-13 11:19:00 +02:00
Jens Steube
a6a6bb200a Mark NV 441.x as fixed 2020-06-03 14:38:04 +02:00
Jens Steube
1e469a96a4 Add missing branch in automatic alias device selection 2020-06-03 12:31:17 +02:00
Jens Steube
34f71aaea3 Re-enable POCL is version detected is >= 1.5 and LLVM is >= 9.x and also remove performance warning. Still prefers native OpenCL runtime in alias detection, but this default can be overriden using -d parameter. 2020-06-03 09:29:20 +02:00
Matt Palmer
240d35976a Fix build warning in DEBUG mode
Just a tiny cleanup to avoid an 'unused variable' warning when building
with DEBUG=1.
2020-04-18 12:18:18 +10:00
Jens Steube
008072eb65 OpenCL Runtime: Added a warning if OpenCL runtime NEO, Beignet, POCL or MESA is detected and skip associated devices (override with --force) 2020-04-01 10:16:49 +02:00
Jens Steube
434ad76381 Improve alias device detection to distinguish between Intel CPU and embedded GPU 2020-03-13 10:01:57 +01:00
Jens Steube
ba7163062d Do not set -cl-std=XXX to workaround NEO driver bug causing to hang while compiling -m 22000 2020-03-13 09:43:41 +01:00
Jens Steube
2b2a7ede66 OpenCL Options: Set --spin-damp to 0 (disabled) by default. With the CUDA backend this workaround became deprecated 2020-03-12 10:51:10 +01:00
Jens Steube
8c3808bad5 Fix NUL filename on windows 2020-03-09 20:12:36 +01:00
Jens Steube
3e4d110fd2 Add stderr redirection the regular way 2020-03-09 20:05:23 +01:00
Jens Steube
125e9ec863 Do not redirect stderr to /dev/null to prevent rocm 3.1 from crashing on debian 2020-03-09 11:13:43 +01:00
Jens Steube
f381e1bbf8 Remove force_recompile functionality, doesn't work with cubin anymore 2020-02-29 10:38:20 +01:00
Jens Steube
f96e35649d Change bitsliced kernels from 3d to 2d invocation mode for slightly better performance 2020-02-22 07:59:58 +01:00
Jens Steube
d9473358ef Add support for OPTS_TYPE_LOOP_EXTENDED kernel for special cases like VeraCrypt 2020-02-20 16:00:21 +01:00
Jens Steube
c90d83c3eb Prepare for UNROLL whitelisting 2020-02-15 12:44:12 +01:00
Jens Steube
4788c61dd2 Add OPTI_TYPE_REGISTER_LIMIT flag to enable register limiting in CUDA 2020-02-04 21:53:27 +01:00
Jens Steube
17a64f5019 Set a fixed register count maximumfor CUDA kernel. This prevents kernels going out of control and to have negative effects on other kernels from the same source code (For instance 16600) 2020-02-04 18:31:23 +01:00
Jens Steube
c40f474c2e Add special module option to indicate the kernel is using dynamic shared memory 2020-02-02 11:24:38 +01:00
Jens Steube
fb7bb04587 Do not use dynamic shared memory if dynamic_local_mem_size is a multiple of local_mem_size 2020-02-02 11:15:37 +01:00
Jens Steube
96a2c36f53 Reduce CUDA Toolkit minimum version to 9.0 (even 8.0 should be sufficient) 2020-02-01 19:32:03 +01:00
Jens Steube
aef53f7e10 OpenCL Runtime: Allow the kernel to access post-48k shared memory region on CUDA. Requires both module and kernel preparation 2020-02-01 14:27:42 +01:00
Jens Steube
1fc37c25f9 OpenCL Kernels: Moved "gpu_decompress", "gpu_memset" and "gpu_atinit" into new OpenCL/shared.cl in order to reduce compile time 2020-02-01 09:00:48 +01:00
Jens Steube
08163501cf Add option to disable cubin cache binaries and moved some redundant kernel load code into specific function 2020-01-31 17:50:53 +01:00
Jens Steube
01085cdab2 Move cujit_opts allocation closer to the calling functions because CUDA library needs it reinitialized after each use 2020-01-31 11:59:59 +01:00
Jens Steube
346637ec43 Improve cujit logging 2020-01-30 11:44:57 +01:00
Jens Steube
66ae5125ce Cache cubin instead of PTX to decrease startup time 2020-01-29 15:56:36 +01:00
Jens Steube
cc4fd48ace Optimize hook buffer size to be copied 2020-01-26 20:31:38 +01:00
Jens Steube
041a777025 OpenCL Runtime: Unlocked maximum thread count for NVIDIA GPU 2020-01-24 13:24:19 +01:00
Jens Steube
ccacc508cb Reenabled support for Intel GPU OpenCL runtime (Beignet and NEO) because a workaround was found (force -cl-std=CL2.0) 2020-01-24 10:52:12 +01:00
Jens Steube
fe372dffb7 Add RDNA ISA instructions test for ADD/ADDC/SUB/SUBB 2020-01-06 12:49:57 +01:00
Jens Steube
df5e2361d3 Disable inline assembly instruction tests for CUDA and refer to documented requirements 2020-01-03 12:27:27 +01:00
Jens Steube
d0fb171da9 Added new options --backend-ignore-cuda and --backend-ingore-opencl, to ignore CUDA and/or OpenCL interface from being load on startup 2020-01-03 11:51:24 +01:00
Jens Steube
b3690fcd05 Backport instruction test cache from CUDA to OpenCL 2020-01-03 11:06:55 +01:00
Jens Steube
2b4d0656d5 Cache inline assembly instruction check results for same devices types 2020-01-03 10:44:10 +01:00
Jens Steube
5d1d48f5d7 Do not check for COPY_PW limits in outside kernels 2019-12-31 21:25:37 +01:00
Jens Steube
53254b45aa Backport inc_ecc_secp256k1 inline assembly code for AMD ISA 2019-12-05 15:43:01 +01:00
Jens Steube
bfd95d42f6 - OpenCL Runtime: Reenabled support for Intel GPU OpenCL runtime 2019-11-27 10:28:12 +01:00
Jens Steube
2884bded32 Initialize some variable to make scan-build happy 2019-11-26 10:55:57 +01:00
Jens Steube
00b9f4c557 Add kernel accel minimum limit check 2019-11-19 20:38:31 +01:00
Jens Steube
424777ae28 Add kernel accel limiter based on kernel threads to reduce host memory requirements 2019-11-19 17:59:50 +01:00
Jens Steube
f7c3ced548 Fix use of calloc() in backend.c 2019-11-17 19:59:23 +01:00
Jens Steube
c4dd020685 Add support for NVIDIA Jetson AGX Xavier developer kit 2019-11-16 17:27:35 +01:00
Jens Steube
53e96a12a0 Improve automatic calculation of hook threads value 2019-11-16 11:48:25 +01:00
Jens Steube
fe8c17f4c7 Support pause/abort in hooks 2019-11-15 14:42:34 +01:00
Jens Steube
9c2c73c6cc Clear hook buffers after full kernel chain is finished 2019-11-15 10:12:33 +01:00
Jens Steube
7458e4f487 Add per-device available memory test of static data (hashlist, ruleset) before test of dynamic data (-n based) 2019-11-14 11:31:00 +01:00
Rosen Penev
a6edb84157
Fix extra semicolon warnings
These macros don't need a ; but since ; is used, make the macros more
robust by enclosing them in a do while loop.
2019-11-09 16:42:50 -08:00
Jens Steube
c12470b978
Merge pull request #2188 from neheb/cast
Add casts where needed in C++ mode
2019-11-05 12:28:21 +01:00
Jens Steube
a8555fa048 Support use of all available CPU cores for hash-mode specific hooks 2019-11-03 12:05:52 +01:00
Rosen Penev
fd8150769d Add casts where needed in C++ mode
Otherwise, -fpermissive must be passed.
2019-09-11 18:05:01 -07:00
Jens Steube
57a149276c Do alias check only in case both CUDA and OpenCL devices were detected 2019-08-06 12:44:39 +02:00
Jens Steube
97c9e86d15 Filehandling: Print a truncation warning in case an oversized line was detected 2019-08-06 12:22:24 +02:00
Rosen Penev
dca1a86315
Run through Clang's bugprone-macro-parentheses 2019-08-03 22:37:38 -07:00
Rosen Penev
6dc72ebcc5
Run through Clang's readability-else-after-return
There's no need for a return statement in an else path. Just take it out.
Simplifies the code slightly.
2019-08-03 22:37:38 -07:00
Rosen Penev
fb75164126
Run through Clang's google-readability-casting
Removes casts where the type is identical.
2019-08-03 22:37:38 -07:00
Rosen Penev
2f76326c37
Run through Clang's android-cloexec checkers
This is mainly useful with SELinux.
2019-08-03 22:37:37 -07:00
Rosen Penev
98e17d5774
Run through clang-tidy's readability-uppercase-literal-suffix
1 and l are visually similar and can be confused. This also changes u to U
for consistency.
2019-08-03 19:59:17 -07:00
Gabriele Gristina
ae62e597ce (backend) remove unused *rc* vars and cleanup 2019-07-10 16:13:11 +02:00
Jens Steube
a7fd1e40f8
Merge pull request #2075 from matrix/zlib_support_2
Add zlib support for loading hashlist/wordlist (v2)
2019-07-10 10:56:06 +02:00
Gabriele Gristina
2db6dfcd4e fix HCFILE with potfile BUG and something else related to HCFILE wrong usage 2019-07-02 18:27:36 +02:00
Gabriele Gristina
ea786f715f avoid logical negation operator 2019-07-02 15:52:17 +02:00
Gabriele Gristina
3161aec3da fix the comments :) 2019-07-01 17:27:08 +02:00
Gabriele Gristina
5679ca3344 Rewrite hc_fopen to better handling file descriptor locking/unlocking functions, saving kernels binary from plain to gzip format 2019-07-01 01:30:24 +02:00
Gabriele Gristina
caf34e0e83 Fix some *print* format arguments 2019-06-29 17:49:57 +02:00
Gabriele Gristina
5d3ed3e754 Remove union from HCFILE, using std file ops in ocl_check_dri, remove debug comments 2019-06-28 17:58:08 +02:00
Gabriele Gristina
c2e634c426 switch is_gzip from short to bool 2019-06-27 23:51:54 +02:00
Gabriele Gristina
481c752456 No more compress functions, update example.dict.gz, remove some comments 2019-06-27 20:18:47 +02:00
Gabriele Gristina
398c89c75c switch almost all FILE ops, potfile is the only missing 2019-06-26 19:06:46 +02:00
Jens Steube
2cda236a18 OpenCL Runtime: Do not run a shared- and constant-memory size check if their memory type is of type global memory (typically CPU) 2019-06-22 16:01:38 +02:00
Jens Steube
6dfb474adf OpenCL Runtime: Do not run a shared- and constant-memory size check if their memory type is of type global memory (typically CPU) 2019-06-22 16:00:48 +02:00
Gabriele Gristina
b2529af172 remove original commented code 2019-06-22 15:00:50 +02:00
Gabriele Gristina
6cb4abd526 Add zlib support v2 2019-06-21 21:56:38 +02:00
Jens Steube
955bfeaa14 Improve performance of bitsliced algorithms on ROCm 2019-06-19 16:35:52 +02:00
Jens Steube
5e0eb288c9 Use __launch_bounds__ in CUDA as replacement for reqd_work_group_size() in OpenCL 2019-06-16 18:01:26 +02:00
Jens Steube
c2fc849e2c Fix minimum threads_per_block check 2019-06-06 20:46:20 +02:00
Jens Steube
0568c0746a Emulate effect of reqd_work_group_size() in CUDA 2019-06-06 17:49:41 +02:00
Jens Steube
44ecc83d82 Do some CUDA and NVRTC version checks on startup 2019-06-05 10:53:48 +02:00
Jens Steube
03ed89684e Use --restrict nvrtc option by default 2019-06-04 17:35:10 +02:00
Jens Steube
87c336e822 Fix format warning in backend.c 2019-06-03 13:41:52 +02:00
Jens Steube
1f6c82b6d1 Add hc_cuModuleLoadDataExLog wrapper function for more detailed error logging from CUDA 2019-06-01 07:47:30 +02:00
Jens Steube
ce8a6fde0a Fix status screen current password query 2019-05-14 15:25:36 +02:00
Jens Steube
f84eaa2e4d Fix bitsliced algorithm brute-force with CUDA 2019-05-14 14:08:27 +02:00
Jens Steube
523e0f7151 Fix free unallocated memory in case OpenCL initialization failed 2019-05-14 10:25:49 +02:00
Jens Steube
bca03bb7ed CUDA offers a nice way to query available device memory, no need to brute force 2019-05-14 10:09:46 +02:00
Jens Steube
a6bc1d3cc0 Experimental kernel-thread autotuner 2019-05-11 11:58:18 +02:00
Jens Steube
d59474fded Testwise unlock full thread count on NVidia 2019-05-10 17:27:15 +02:00
Jens Steube
d378aa7ab9 Show host memory requirement on startup 2019-05-10 16:37:49 +02:00
Jens Steube
46f737c5af Use real constant memory on CUDA 2019-05-10 13:22:26 +02:00
Jens Steube
5d14a59304 Need 3.x nvrtc minimum 2019-05-10 10:11:12 +02:00
Jens Steube
54feb62e94 brute-force nvrtc .dll name 2019-05-09 22:17:13 +02:00
Jens Steube
a2b5981303 Fix some library names 2019-05-09 21:20:50 +02:00
Jens Steube
be8f29ca39 Only warn about broken NVIDIA driver 2019-05-09 16:30:08 +02:00
Jens Steube
39e150fc1e Use xxx_v2 CUDA symbols 2019-05-09 14:37:14 +02:00
Jens Steube
33028314f0 Add hc_cuCtxSetCacheConfig() 2019-05-09 00:04:05 +02:00
Jens Steube
fb82bfc169 Improve thread handling based on FIXED_LOCAL_SIZE 2019-05-08 23:30:07 +02:00
Jens Steube
3a3df091c7 Fix CUDA num_elements 2019-05-08 22:42:52 +02:00
Jens Steube
363e789b89 Assume local nvrtc.dll and cuda.dll on windows 2019-05-07 16:52:08 +02:00
Jens Steube
a7d04adba3 Fix opencl_devices_active and backend_devices_active 2019-05-07 14:17:29 +02:00
Jens Steube
8ff8c5d536 Add LOCAL_VK to make use of __shared__ 2019-05-07 09:01:32 +02:00
Jens Steube
bbed0cd67a Fix test.sh and bitsliced algos 2019-05-06 15:06:02 +02:00
Jens Steube
d0bd33c9d1 Rename CONSTANT_AS to CONSTANT_VK 2019-05-06 14:34:16 +02:00
Jens Steube
64c495dfa5 Use CUDA stream for all cuLaunchKernel() invocations 2019-05-06 11:23:34 +02:00
Jens Steube
d94f582097 Replace CEILDIV() with round_up_multiple_64() 2019-05-06 09:36:07 +02:00
Jens Steube
e9c04c2446 More CUDA implementation 2019-05-05 21:15:46 +02:00
Jens Steube
08dc1acc02 More CUDA rewrites 2019-05-05 11:57:54 +02:00
Jens Steube
ec9925f3b1 Warnings self-check and autotune with CUDA 2019-05-04 21:52:00 +02:00
Jens Steube
4df00033d7 Prepare CUDA events 2019-05-04 10:44:03 +02:00
Jens Steube
f2948460c9 Some first kernel invocations 2019-05-04 10:13:43 +02:00
Jens Steube
5ee033673c Disable name mangling in NVRTC's PTX output and more 2019-05-03 15:50:07 +02:00
Jens Steube
503304f36a Add some first CUDA device memory allocations and host buffer copies 2019-05-03 12:07:06 +02:00
Jens Steube
50a6e720ca More OpenCL variables rename 2019-05-02 17:30:46 +02:00
Jens Steube
af8e317cf4 Begin renaming some OpenCL only variables 2019-05-02 17:12:59 +02:00
Jens Steube
a6fa7a2749 Add support for some first CUDA module loader 2019-05-02 14:58:52 +02:00
Jens Steube
456c57a6d0 Set vector width size for CUDA 2019-05-01 18:20:19 +02:00
Jens Steube
3c4f4df771 Rename some more variables 2019-05-01 15:52:56 +02:00
Jens Steube
495d89f831 Find alias devices across different backend API's 2019-05-01 07:27:10 +02:00
Jens Steube
6fd936b43a Removed --opencl-platforms filter in order to force backend device numbers to stay constant 2019-04-30 16:24:13 +02:00
Jens Steube
e3500ff4aa Add CUDA device attributes to -I 2019-04-30 13:38:44 +02:00
Jens Steube
d862458ab5 Begin renaming API specific variables in backend section 2019-04-29 10:21:59 +02:00
Jens Steube
d73c0ac8a9 More CUDA attribute queries 2019-04-28 18:54:26 +02:00
Jens Steube
a415422123 Initialize CUDA devices and some first attribute queries 2019-04-28 14:45:50 +02:00
Jens Steube
58213c81d6 Add vector datatypes operators 2019-04-26 22:07:56 +02:00
Jens Steube
052e42ccef Fix CUDA_ARCH value 2019-04-26 15:14:48 +02:00
Jens Steube
06171958ee Add --gpu-architecture to NVRTC build options 2019-04-26 15:10:02 +02:00
Jens Steube
9faba41848 Use nvrtc to compile PTX (resulting PTX not yet used) 2019-04-26 13:28:44 +02:00
Jens Steube
4045e60021 Add nvrtc wrapper for later use 2019-04-26 10:03:16 +02:00
Jens Steube
4b986de5fb Prepare native CUDA hybrid integration 2019-04-25 14:45:17 +02:00