1
0
mirror of https://github.com/hashcat/hashcat.git synced 2025-01-04 04:40:58 +00:00
Commit Graph

21 Commits

Author SHA1 Message Date
jsteube
51dd982b12 Bring back some volatile for AMD 2017-09-08 14:08:21 +02:00
jsteube
9125062ffc Move volatiles for AMD closer to the problem 2017-09-08 13:32:19 +02:00
jsteube
40b57677cd OpenCL Kernels: Reactivate Dalibors XOR optimization on MD5_H on all MD5 based algorithms 2017-08-30 15:32:09 +02:00
jsteube
938c281ee0 Resurrect some volatile variables in order to correctly compile pure kernels on AMD drivers 2017-08-25 17:06:07 +02:00
jsteube
967e96728d Make all the OpenCL kernel function includes static 2017-08-16 20:27:17 +02:00
jsteube
5e34ec348e Optimize kernels for ROCm 1.6
- Remove inline keywords
- Remove volatile keywords where it causes ROCm to slow down
- Replace DES functions (looks like bitselect somehow is no longer mapped to BFI_INT)
2017-07-22 18:05:18 +02:00
jsteube
eae9329761 Workaround some AMD JiT compiler segfault on complex kernels 2017-07-19 13:34:36 +02:00
jsteube
7205f450dd Backport more HMAC functions in inc_hash_xxx.cl from global to private 2017-07-14 16:58:30 +02:00
jsteube
c4098e2230 Fix invalid use of a non-vector function from within a vector function 2017-07-14 14:16:48 +02:00
jsteube
4e0972ce3a Add xxx_update_vector_swap(), xxx_update_vector_utf16le_swap() and xxx_update_vector_utf16beN() for later use 2017-07-14 13:24:40 +02:00
jsteube
9c6c21490f Add *_hmac_init_swap for later use 2017-07-13 19:22:31 +02:00
jsteube
9c12459852 Add HMAC vector functions to inc_hash_* 2017-07-13 12:18:17 +02:00
jsteube
c512e0c01a Add example -L kernel for algorithms with appended salt in utf16le 2017-07-13 00:16:29 +02:00
jsteube
9b6c6df53d Add xxx_nit_vector_from_scalar() to all inc_hash_xxx.cl includes 2017-07-12 15:45:22 +02:00
jsteube
8a6e3a5275 Add support in HMAC for passwords larger than block size of the underlaying hash 2017-07-10 11:15:15 +02:00
jsteube
f619811b70 Remove PBKDF2-HMAC-MD5 includes password length limit 2017-07-09 23:53:53 +02:00
philsmd
03f4e2b3dc minor typo fixed in comment for the new update() functions 2017-07-05 12:16:37 +02:00
Jens Steube
56dc8ae359 Add two functions md5_update_global_utf16le_swap() and md5_update_global_swap() for later use 2017-07-01 15:06:17 +02:00
jsteube
165380c454 Simplify WPA/WPA2 cracking kernel 2017-07-01 14:41:53 +02:00
jsteube
120cf1d1ba Removed some unused functions, added -m 500 kernel with length 256 support but not activated because too slow 2017-06-23 09:24:50 +02:00
jsteube
71d4926afa Converted -m 400 to password length 256 support
Something weird happend here, read on!

I've expected some performance drop because this algorithm is using the password data itself inside the iteration loop.
That is different to PBKDF2, which I've converted in mode 2100 before and which did not show any performance as expected.

So after I've finished converting this kernel and testing everything works using the unit test, I did some benchmarks to see how much the
performance drop is.

On my 750ti, the speed dropped (minimal) from 981kH/s -> 948kH/s, that's mostly because of the SIMD support i had to drop.
If I'd turn off the SIMD support in the original, the drop would be even less, that us 967kH/s -> 948kH/s which is a bit of a more reasable
comparison in case we just want to rate the drop that is actually caused by the code change itself.

The drop was acceptable for me, so I've decided to check on my GTX1080.Now the weird thing: The performance increased from 6619kH/s to
7134kH/s!!

When I gave it a second thought, it turned out that:

1. The GTX1080 is a scalar GPU so it wont suffer from the drop of the SIMD code as the 750ti did
2. There's a change in how the global data (password) is read into the registers, it reads only that amount of data it actually needs by using
the pw_len information
3. I've added a barrier for CLK_GLOBAL_MEM_FENCE as it turned out to increase the performance in the 750ti

Note that this kernel is now branched into password length < 40 and larger.

There's a large drop on performance where SIMD is really important, for example CPU.

We could workaround this issue by sticking to SIMD inside the length < 40 branch, but I don't know yet how this can be done efficiently.
2017-06-22 13:49:15 +02:00