jsteube
|
40b57677cd
|
OpenCL Kernels: Reactivate Dalibors XOR optimization on MD5_H on all MD5 based algorithms
|
2017-08-30 15:32:09 +02:00 |
|
jsteube
|
6d112aeb39
|
OpenCL Kernels: Rewritten Keccak kernel to run fully on registers and partially reversed last round
|
2017-08-30 13:27:04 +02:00 |
|
jsteube
|
a378abee66
|
Add missing NEW_SIMD_CODE in -m 6600
|
2017-08-29 12:01:43 +02:00 |
|
jsteube
|
1c169af0ad
|
Make -m 14100 a pure kernel only
|
2017-08-28 22:26:30 +02:00 |
|
jsteube
|
2b9888486e
|
Make -m 14000 a pure kernel only and add volatile for asm statement
|
2017-08-28 22:20:40 +02:00 |
|
jsteube
|
99f416435e
|
Fix invalid use of __constant in LM kernel
|
2017-08-28 19:40:51 +02:00 |
|
jsteube
|
6db2f4cc18
|
Fix typo
|
2017-08-28 15:54:47 +02:00 |
|
jsteube
|
918578bee1
|
Improve some NVidia specific inline assembly
|
2017-08-28 14:15:47 +02:00 |
|
jsteube
|
9de1e557bb
|
More VEGA specific inline assembly to improve SHA1 based kernels
|
2017-08-28 09:24:06 +02:00 |
|
jsteube
|
a0be36d7b8
|
Fix compile error caused by __add3()
|
2017-08-27 19:46:17 +02:00 |
|
jsteube
|
00e38cc2c6
|
Add VEGA specific inline assembly to improve all MD4, MD5, SHA1 and SHA256 based kernels
|
2017-08-27 19:36:07 +02:00 |
|
jsteube
|
7bfd343ec9
|
Optimized rule_op_mangle_dupechar_last(), rule_op_mangle_rotate_right(), rule_op_mangle_rotate_left() and append_block1() in rule engine
|
2017-08-27 16:47:21 +02:00 |
|
jsteube
|
52a97fee75
|
Improve rule engine performance by improving append_0x80_xxx() performance by using precomputed values from constant memory
|
2017-08-27 14:22:20 +02:00 |
|
jsteube
|
3260000357
|
Fix whirlpool pure kernel in -a 0 mode
|
2017-08-26 19:51:37 +02:00 |
|
jsteube
|
e3810d054b
|
Fix some use of pw_t tmp variable
|
2017-08-26 19:48:38 +02:00 |
|
jsteube
|
5e01ff4c53
|
Refactor some u32x to u32 where u32x is not needed
|
2017-08-26 18:31:50 +02:00 |
|
jsteube
|
1aa76eac15
|
Refactor use of __constant to match up with the user selected attack mode
|
2017-08-25 17:52:55 +02:00 |
|
jsteube
|
938c281ee0
|
Resurrect some volatile variables in order to correctly compile pure kernels on AMD drivers
|
2017-08-25 17:06:07 +02:00 |
|
jsteube
|
48fbe81a09
|
Add more inline assembly for AMD ROCm
|
2017-08-25 16:33:00 +02:00 |
|
jsteube
|
6c619155c3
|
Workaround ROCm compiler error in aes256_ExpandKey()
|
2017-08-25 12:10:36 +02:00 |
|
jsteube
|
8c9c36ee2a
|
Fix out-of-bound access in aesXXX_InvertKey()
|
2017-08-25 11:52:07 +02:00 |
|
jsteube
|
bed7e8f466
|
Remove unused truncate_block_xxx_xx() functions and update kernels to use the _S function
|
2017-08-24 20:07:43 +02:00 |
|
jsteube
|
51dc1c7db3
|
Use truncate_block_4x4_le_S() instead of truncate_block_4x4_le() in -m 6800
|
2017-08-24 19:53:29 +02:00 |
|
jsteube
|
9b73c464d2
|
Fix typo in macro
|
2017-08-24 17:19:16 +02:00 |
|
jsteube
|
7b443ee7ff
|
Optimize performance of rule_op_mangle_title_sep(), rule_op_mangle_purgechar() and rule_op_mangle_replace()
|
2017-08-24 17:14:33 +02:00 |
|
jsteube
|
0de41c2716
|
Some more optimizations for rule engine
|
2017-08-24 15:09:55 +02:00 |
|
jsteube
|
9f8c5a253d
|
More rule engine performance optimizations
|
2017-08-24 00:49:46 +02:00 |
|
jsteube
|
0783289e2f
|
Optimized a0 pure kernel for AMD
|
2017-08-23 13:40:22 +02:00 |
|
jsteube
|
a5659d5619
|
Also switch optimized kernels rule engine to make use of kernel rules in constant memory
|
2017-08-23 12:46:14 +02:00 |
|
jsteube
|
1d04de3a8e
|
Limit kernel-loops in straight-mode to 256, therefore allow rules to be stored in constant memory
|
2017-08-23 12:43:59 +02:00 |
|
jsteube
|
51372438fe
|
Allow OpenCL kernel inline assembly if ROCm drivers was detected
|
2017-08-22 18:47:53 +02:00 |
|
jsteube
|
8853884f2a
|
Fix append_four_byte() in case sm8 is 0
|
2017-08-21 16:04:43 +02:00 |
|
jsteube
|
f32e113942
|
Add missing case in append_block() in pure kernel rule engine
|
2017-08-20 15:08:51 +02:00 |
|
jsteube
|
6907981f08
|
Backport current state of optimized kernel rule engine to CPU
|
2017-08-20 12:50:24 +02:00 |
|
jsteube
|
508f1562f2
|
Fix --stdout kernels, gid_max was still set to u32
|
2017-08-20 12:13:34 +02:00 |
|
jsteube
|
319799bbbf
|
Switch the datatypes of the variables responsible for work-item count and work-item size from u32 to u64
|
2017-08-19 16:39:22 +02:00 |
|
jsteube
|
d9c906e134
|
Move 0x80 to hardcoded position for sha3-256 bit in order to allow ROCm compiler to use registers only
|
2017-08-18 16:22:25 +02:00 |
|
jsteube
|
694cc0b740
|
Remove all calls to overwrite_at_* functions
|
2017-08-17 16:20:01 +02:00 |
|
jsteube
|
e984a829ea
|
Remove no longer needed overwrite_at_* functions
|
2017-08-17 15:53:09 +02:00 |
|
jsteube
|
bf299fe043
|
Optimized 3DES for rocm
|
2017-08-17 14:03:55 +02:00 |
|
jsteube
|
ad1ce462d1
|
Get rid of ceil() in OpenCL kernels
|
2017-08-17 13:43:35 +02:00 |
|
jsteube
|
53f53fe014
|
Reduced number of required registers in SIP based on maximum possible esalt length
|
2017-08-17 12:16:49 +02:00 |
|
jsteube
|
9ee5da40e0
|
Workaround rocm compiler error for -m 15300
|
2017-08-17 11:25:34 +02:00 |
|
jsteube
|
88e995ddcf
|
Replace some SIMD related function calls
|
2017-08-17 11:18:39 +02:00 |
|
jsteube
|
5b5bdf3889
|
Replace some SIMD related function calls
|
2017-08-17 10:18:17 +02:00 |
|
jsteube
|
967e96728d
|
Make all the OpenCL kernel function includes static
|
2017-08-16 20:27:17 +02:00 |
|
jsteube
|
21e9c63d46
|
Fix rotl64() the same was as rotr64()
|
2017-08-16 17:58:33 +02:00 |
|
jsteube
|
58012ada0c
|
Fall back to old rotr64 optimization for AMD
|
2017-08-16 16:14:46 +02:00 |
|
philsmd
|
4a89172140
|
reformatting; replaced some tabs with spaces
|
2017-08-16 13:46:40 +02:00 |
|
jsteube
|
ec874c1d59
|
Optimized the following pure kernel rule engine functions:
- mangle_lrest()
- mangle_lrest_ufirst()
- mangle_urest()
- mangle_urest_lfirst()
- mangle_trest()
- mangle_toggle_at()
- mangle_reverse()
- mangle_dupeword()
- mangle_reflect()
- mangle_rotate_left()
- mangle_rotate_right()
- mangle_switch_first()
- mangle_switch_last()
- mangle_switch_at()
- mangle_title_sep()
- mangle_title_sep()
Added some helper functions:
- generate_cmask()
- append_four_byte()
- append_three_byte()
- append_two_byte()
- append_one_byte()
- append_block()
- exchange_byte()
Removed some helper functions:
- upper_at()
- lower_at()
- toggle_at()
- mangle_switch()
NOTE: Changes need to be backported to CPU when finished
|
2017-08-13 16:43:46 +02:00 |
|