Describe normalized, partly reduced and reduced numbers.
Comment which function expects which kind of input.
Removed unused bn_bitlen.
Add bn_add that does not reduce.
Bug fix in ecdsa_validate_pubkey: bn_mod before bn_is_equal.
Bug fix in hdnode_private_ckd: bn_mod after bn_addmod.
New function `bn_mult_3_2` that multiplies by 3/2.
This function is used in point_double and point_jacobian_double.
Cleaned up point_add and point_double, more comments.
This version of scalar_mult should be faster and much better
against side-channel attacks. Except bn_inverse and bn_mod
all functions are constant time. bn_inverse is only used
in the last step and its input is randomized. The function
bn_mod is only taking extra time in 2^32/2^256 cases, so
in practise it should not occur at all. The input to bn_mod
is also depending on the random value.
There is secret dependent array access in scalar_multiply,
so cache may be an issue.
The new method needs about 30 % less time for prime256k1 and is about
twice as fast for other moduli. The base algorithm is the same.
The code is also a bit smaller and doesn't need the 8 kb precomputed
table.
Important canges:
1. even/odd distinction so that we need to test only one of the numbers
for being even. This also leads to less duplicated code.
2. Allow for shifting by 32 bits at a time in the even test.
3. Pack u,s and v,r into the same array, which saves a bit of stack memory.
4. Don't divide by two after subtraction; this simplifies code.
5. Abort as soon as u,v are equal, instead of subtracting them.
6. Use s instead of r after the loop; no negation needed.
7. New code that divides by 2^k fast without any precomputed values.
Added invariants for bn_multiply and bn_inverse.
Explain that bn_multiply and bn_fast_mod doesn't work for
an arbitrary modulus. The modulus must be close to 2^256.