This version of scalar_mult should be faster and much better
against side-channel attacks. Except bn_inverse and bn_mod
all functions are constant time. bn_inverse is only used
in the last step and its input is randomized. The function
bn_mod is only taking extra time in 2^32/2^256 cases, so
in practise it should not occur at all. The input to bn_mod
is also depending on the random value.
There is secret dependent array access in scalar_multiply,
so cache may be an issue.
This should make side-channel attacks much more difficult. However,
1. Timing of bn_inverse, which is used in point_add depends on input.
2. Timing of reading secp256k1_cp may depend on input due to cache.
3. The conditions in point_add are not timing attack safe.
However point_add is always a straight addition, never double or some
other special case.
In the long run, I would like to use a specialized point_add using Jacobian
representation plus a randomization when converting the first point to
Jacobian representation. The Jacobian representation would also make
the procedure a bit faster.
An invalid point may crash the implementation or, worse,
reveal information about the private key if used in a ECDH
context (e.g. cryptoMessageEn/Decrypt).
Therefore, check all user supplied points even if
USE_PUBKEY_VALIDATE is not set.
To improve speed, we don't check if the point lies in the
main group, since the secp256k1 curve does not have
any other subgroup.
The standard says:
step h:
Set T to the empty sequence.
while tlen < qlen
V = HMAC_K(V)
T = T || V
k = bits2int(T)
in this case (HMAC-SHA256, qlen=256bit) this simplifies to
V = HMAC_K(V)
T = V
k = bits2int(T)
and T can be omitted.
The old code (wrong) did:
T = HMAC_K(V)
k = bits2int(T)
Note that V will only be used again if the first k is out of range.
Thus, the old code produced the right result with a very high probability.