The old implementation needed 6 sha transformations per iterations:
- 2 for computing sha512 of seed,
- 2 for computing digests of ipads/opads,
- 2 for computing digests of intermediate hashes.
The first 4 transformations are the same in every iteration so we cache
them. A new function hmac_sha512_prepare computes these digests.
We made sha512_Transform visible in pbkdf2 and prevent unneccessary
big/little endian conversions back and forth.