mirror of
https://github.com/bitcoinbook/bitcoinbook
synced 2024-11-22 08:08:11 +00:00
CH04::bech32 and bech32m: add new sections
- Briefly mention segwit and the need for new addresses. Mention that getting wallets to a new base58check version would probably be only a little less work than upgrading to an entirely new address format. Describe the problems with base58check and the solutions provide by bech32. Illustrate some of the problems and solutions. - Describe the bech32 length extension issue and provide an example. - Introduce bech32m as the solution to the lengith extension issue. - Provide examples using the bech32m reference library for Python for encoding and decoding a bech32m address (mentioning the backwards compatibility with bech32 addresses). - Ask wallet authors to ensure they support forward compatibility with future segwit versions.
This commit is contained in:
parent
91eae20099
commit
74c144bbf4
389
ch04.asciidoc
389
ch04.asciidoc
@ -1029,6 +1029,395 @@ are only used in
|
||||
https://transactionfee.info/charts/payments-spending-segwit/[about 10% of transactions].
|
||||
Legacy addresses were supplanted by the bech32 family of addresses.
|
||||
|
||||
//FIXME: collision attacks
|
||||
|
||||
=== Bech32 addresses
|
||||
|
||||
In 2017, the Bitcoin protocol was upgraded to prevent transaction
|
||||
identifiers (txids) from being changed without the consent of a spending
|
||||
user (or a quorum of signers when multiple signatures are required).
|
||||
The upgrade, called _segregated witness_ (or _segwit_ for short), also
|
||||
provided additional capacity for transaction data in blocks and several
|
||||
other benefits. However, users wanting direct access to segwit's
|
||||
benefits had to accept payments to variations on the legacy P2PKH and
|
||||
P2SH scripts.
|
||||
|
||||
As mentioned in <<p2sh>>, one of the advantages of the P2SH output type
|
||||
was that a spender (such as Alice) didn't need to know the details of
|
||||
the script the receiver (such as Bob) used. The segwit upgrade was
|
||||
designed to be compatible with this mechanism, allowing users to
|
||||
immediately begin accessing many of the new benefits by using a P2SH
|
||||
address. But for Bob to gain access to all of the benefits, he would
|
||||
need Alice's wallet to pay him using a different type of script. That
|
||||
would require Alice's wallet to upgrade to supporting the new scripts.
|
||||
|
||||
At first, Bitcoin developers proposed BIP142, which would continue using
|
||||
Base58Check with a new version byte, similar to the P2SH upgrade. But
|
||||
getting all wallets to upgrade to new scripts with a new Base58Check
|
||||
version was expected to require almost as much work as getting them to
|
||||
upgrade to an entirely new address format, so several Bitcoin
|
||||
contributors set out to design the best possible address format. They
|
||||
identified several problems with Base58Check:
|
||||
|
||||
- Its mixed case presentation made it inconvenient to read aloud or
|
||||
transcribe. Try reading one of the legacy addresses in this chapter
|
||||
to a friend who you have transcribe it. Notice how you have to prefix
|
||||
every letter with the words "uppercase" and "lowercase". Also note
|
||||
when you review their writing that the uppercase and lowercase
|
||||
versions of some letters can look similar in many people's
|
||||
handwriting.
|
||||
|
||||
- It can detect errors, but it can't help users correct those errors.
|
||||
For example, if you accidentally transpose two characters when manually
|
||||
entering an address, your wallet will almost certainly warn that a
|
||||
mistake exists, but it won't help you figure out where the error is
|
||||
located. It might take you several frustrating minutes to eventually
|
||||
discover the mistake.
|
||||
|
||||
- A mixed case alphabet also requires extra space to encode in QR code
|
||||
images, which are commonly used to share addresses and invoices
|
||||
between wallets. That extra space means QR codes need to be larger at
|
||||
the same resolution or they become harder to scan quickly.
|
||||
|
||||
- It requires every spender wallet upgrade to support new protocol
|
||||
features like P2SH and segwit. Although the upgrades themselves might
|
||||
not require much code, experience shows that many wallet authors are
|
||||
busy with other work and can sometimes delay upgrading for years.
|
||||
This adversely affects everyone who wants to use the new features.
|
||||
|
||||
The developers working on an address format for segwit found solutions
|
||||
for each of these problems in a new address format called
|
||||
bech32 (pronounced with a soft "ch", as in "besh thirty-two"). The
|
||||
"bech" stands for BCH, the initials of the three individuals who
|
||||
discovered the cyclic code in 1959 and 1960 upon which bech32 is based.
|
||||
The "32" stands for the number of characters in the bech32 alphabet
|
||||
(similar to the 58 in Base58Check).
|
||||
|
||||
- Bech32 uses only numbers and a single case of letters (preferably
|
||||
rendered in lowercase). Despite its alphabet being almost half the
|
||||
size of the Base58Check alphabet, bech32 addresses are only slightly
|
||||
longer than the longest equivalent P2PKH legacy addresses.
|
||||
|
||||
- Bech32 can both detect and help correct errors. In an address of an
|
||||
expected length, it is mathematically guaranteed to detect any error
|
||||
affecting four characters or less; that's more reliable than
|
||||
Base58Check. For longer errors, it will fail to detect them less than
|
||||
one time in a billion, which is roughly the same reliability as
|
||||
Base58Check. Even better, for an address typed with just a few
|
||||
errors, it can tell the user where those errors occurred, allowing them
|
||||
quickly correct minor transcription mistakes. See <<bech32_typo_detection>>
|
||||
for an example of an address entered with errors.
|
||||
|
||||
[[bech32_typo_detection]]
|
||||
.Bech32 typo detection
|
||||
====
|
||||
Address:
|
||||
bc1p9nh05ha8wrljf7ru236aw**n**4t2x0d5ctkkywm**v**9sclnm4t0av2vgs4k3au7
|
||||
|
||||
Detected errors shown in bold. Generated using the
|
||||
https://bitcoin.sipa.be/bech32/demo/demo.html[bech32 address decoder demo].
|
||||
====
|
||||
|
||||
- Bech32 is preferably written with only lowercase characters, but those
|
||||
lowercase characters can be replaced with uppercase characters before
|
||||
encoding an address in a QR code. This allows the use of a special QR
|
||||
encoding mode that uses less space. Notice the difference in size and
|
||||
complexity of the two QR codes for the same address in
|
||||
<<bech32_qrcode_uc_lc>>.
|
||||
|
||||
[[bech32_qrcode_uc_lc]]
|
||||
.The same bech32 address QR encoded in uppercase and lowercase
|
||||
image::images/bech32-qrcode-uc-lc.png["The same bech32 address QR encoded in uppercase and lowercase"]
|
||||
|
||||
- Bech32 takes advantage of an upgrade mechanism designed as part of
|
||||
segwit to make it possible for spender wallets to be able to pay
|
||||
output types that aren't in use yet. The goal was to allow developers
|
||||
to build a wallet today that allows spending to a bech32 address which
|
||||
will work without changes even years from now when a later protocol
|
||||
upgrade adds a new feature for users who receive bitcoins. It was
|
||||
hoped that we might never again need to go through the system-wide
|
||||
upgrade cycles necessary to allow people to fully use P2SH and segwit.
|
||||
|
||||
==== Problems with bech32 addresses
|
||||
|
||||
Bech32 addresses would have been a success in every area except for one
|
||||
problem. The mathematical guarantees about their ability to detect
|
||||
errors only apply if the length of the address you enter into a wallet
|
||||
is the same length of the original address. If you add or remove any
|
||||
characters during transcription, the guarantee doesn't apply and your
|
||||
wallet may spend funds to a wrong address. However, even without the
|
||||
guarantee, it was thought that it would be unlikely that a user adding
|
||||
or removing characters would produce a string with a valid checksum.
|
||||
|
||||
Unfortunately, the choice for one of the constants in the bech32
|
||||
algorithm just happened to make it very easy to add or remove the letter
|
||||
"q" in the penultimate position of an address that ends with the letter
|
||||
"p". In those cases, you can also add or remove the letter "q" multiple
|
||||
times. This will be caught by the checksum some of the time, but it
|
||||
will be missed far more often than the one-in-a-billion expectations for
|
||||
bech32's substitution errors.
|
||||
|
||||
.Extending the length of bech32 address without invalidating its checksum
|
||||
====
|
||||
----
|
||||
Intended bech32 address:
|
||||
bc1pqqqsq9txsqp
|
||||
|
||||
Incorrect addresses with a valid checksum:
|
||||
bc1pqqqsq9txsqqqqp
|
||||
bc1pqqqsq9txsqqqqqqp
|
||||
bc1pqqqsq9txsqqqqqqqqp
|
||||
bc1pqqqsq9txsqqqqqqqqqp
|
||||
bc1pqqqsq9txsqqqqqqqqqqqp
|
||||
----
|
||||
====
|
||||
//from segwit_addr import *
|
||||
//
|
||||
//for foo in range(0,1000):
|
||||
// addr = encode('bc', 1, foo.to_bytes(3,'big'))
|
||||
// print(foo, addr)
|
||||
|
||||
|
||||
|
||||
For the initial version of segwit (version 0), this wasn't a practical
|
||||
concern. Only two valid lengths were defined for v0 segwit outputs: 22
|
||||
bytes and 34 bytes. Those correspond to bech32 addresses 42 characters
|
||||
or 62 characters long, so someone would need to add or remove the letter "q"
|
||||
from the penultimate position of a bech32 address 20 times in order to
|
||||
send money to an invalid address without a wallet being able to detect
|
||||
it. However, it would become a problem for users in the future if
|
||||
a segwit-based upgrade were ever to be implemented.
|
||||
|
||||
==== Bech32m
|
||||
|
||||
Although bech32 worked well for segwit v0, developers didn't want to
|
||||
unnecessarily constrain output sizes in later versions of segwit.
|
||||
Without constraints, adding or removing a single "q" in a bech32 address
|
||||
could result in a user accidentally sending their money to an
|
||||
output that was either unspendable or spendable by anyone (allowing
|
||||
those bitcoins to be taken by anyone). Developers exhaustively analyzed the bech32
|
||||
problem and found that changing a single constant in their algorithm
|
||||
would eliminate the problem, ensuring that any insertion or deletion of
|
||||
up to five characters will only fail to be detected less often than one
|
||||
time in a billion.
|
||||
|
||||
//https://gist.github.com/sipa/a9845b37c1b298a7301c33a04090b2eb
|
||||
|
||||
The version of bech32 with a single different constant is known as
|
||||
Bech32 Modified (bech32m). All of the characters in bech32 and bech32m
|
||||
addresses for the same underlying data will be identical except for the
|
||||
last six (the checksum). That means a wallet will need to know which
|
||||
version is in use in order to validate the checksum, but both address
|
||||
types contain an internal version byte that makes determining that easy.
|
||||
|
||||
===== Encoding and Decoding bech32m addresses
|
||||
|
||||
In this section, we'll look at the encoding and parsing rules for
|
||||
bech32m Bitcoin addresses since they encompass the ability to parse
|
||||
bech32 addresses and are the current recommended address format for
|
||||
Bitcoin wallets.
|
||||
|
||||
Bech32m addresses start with a Human Readable Part (HRP). There are
|
||||
rules in BIP173 for creating your own HRPs, but for Bitcoin you only
|
||||
need to know about the HRPs already chosen:
|
||||
|
||||
.Bech32 HRPs for Bitcoin
|
||||
[cols="1,1"]
|
||||
|===
|
||||
| bc
|
||||
| Bitcoin mainnet
|
||||
|
||||
| tb
|
||||
| Bitcoin testnet
|
||||
|===
|
||||
|
||||
The HRP is followed by a separator, the number "1". Earlier proposals
|
||||
for a protocol separator used a colon but some operating systems and
|
||||
applications which allow a user to double click on a word to highlight
|
||||
it for copy and pasting won't extend the highlighting to and past a
|
||||
colon. A number ensured double-click highlighting would work with any
|
||||
program that supports bech32m strings in general (which include other
|
||||
numbers). The number "1" was chosen because bech32 strings don't
|
||||
otherwise use it in order to prevent accidental transliteration between
|
||||
the number "1" and the lowercase letter "l".
|
||||
|
||||
The other part of a bech32m address is called the "data part". There
|
||||
are three elements to this part:
|
||||
|
||||
Witness version::
|
||||
A single byte which encodes as a single character
|
||||
in a bech32m Bitcoin address immediately following the separator.
|
||||
This letter represents the segwit version. The letter "q" is the
|
||||
encoding of "0" for segwit v0, the initial version of segwit where
|
||||
bech32 addresses were introduced. The letter "p" is the encoding of
|
||||
"1" for segwit v1 (also called taproot) where bech32m began to be
|
||||
used. There are seventeen possible versions of segwit and it's
|
||||
required for Bitcoin that the first byte of a bech32m data part decode
|
||||
to the number 0 through 16 (inclusive).
|
||||
|
||||
Witness program::
|
||||
From 2 to 40 bytes. For segwit v0, this witness program
|
||||
must be either 20 or 32 bytes; no other length is valid. For segwit
|
||||
v1, the only defined length as of this writing is 32 bytes but other
|
||||
lengths may be defined later.
|
||||
|
||||
Checksum::
|
||||
Exactly 6 characters. This is created using a BCH code, a type of
|
||||
error correction code (although for Bitcoin addresses, we'll see later
|
||||
that it's essential to use the checksum only for error detection--not
|
||||
correction).
|
||||
//TODO
|
||||
|
||||
Let's illustrate these rules by walking through an example of creating
|
||||
bech32 and bech32m addresses. We'll use the
|
||||
For all of the following examples, we'll use the
|
||||
https://github.com/sipa/bech32/tree/master/ref[bech32m reference code
|
||||
for Python].
|
||||
|
||||
Let's start by generating four output scripts, one for each of the
|
||||
different segwit outputs in use at the time of publication, plus one for
|
||||
a future segwit version that doesn't yet have a defined meaning.
|
||||
|
||||
// bc1q9d3xa5gg45q2j39m9y32xzvygcgay4rgc6aaee
|
||||
// 2b626ed108ad00a944bb2922a309844611d25468
|
||||
//
|
||||
// bc1qvj9r9egtd7mu2gemy28kpf4zefq4ssqzdzzycj7zjhk4arpavfhsct5a3p
|
||||
// 648a32e50b6fb7c5233b228f60a6a2ca4158400268844c4bc295ed5e8c3d626f
|
||||
//
|
||||
// bc1p9nh05ha8wrljf7ru236awm4t2x0d5ctkkywmu9sclnm4t0av2vgs4k3au7
|
||||
// 2ceefa5fa770ff24f87c5475d76eab519eda6176b11dbe1618fcf755bfac5311
|
||||
//
|
||||
// bc1sqqqqkfw08p
|
||||
// O_16 OP_PUSH2 0000
|
||||
|
||||
.Scripts for different types of segwit outputs
|
||||
[cols="1,1"]
|
||||
|===
|
||||
| P2WPKH
|
||||
| OP_0 2b626ed108ad00a944bb2922a309844611d25468
|
||||
|
||||
| P2WSH
|
||||
| OP_0 648a32e50b6fb7c5233b228f60a6a2ca4158400268844c4bc295ed5e8c3d626f
|
||||
|
||||
| P2TR
|
||||
| OP_1 2ceefa5fa770ff24f87c5475d76eab519eda6176b11dbe1618fcf755bfac5311
|
||||
|
||||
| Future Example
|
||||
| OP_16 0000
|
||||
|===
|
||||
|
||||
For the P2WPKH output, the witness program contains a commitment constructed in exactly the same
|
||||
way as the commitment for a P2PKH output seen in <<p2pkh>>. A public key is passed into a SHA256 hash
|
||||
function. The resultant 32 byte digest is then passed into a RIPEMD-160
|
||||
hash function. The digest of that function (the commitment) is placed
|
||||
in the witness program.
|
||||
|
||||
For the P2WSH output, we don't use the P2SH algorithm. Instead we take
|
||||
the script, pass it into a SHA256 hash function, and use the 32-byte
|
||||
digest of that function in the witness program. For P2SH, the SHA256
|
||||
digest was hashed again with RIPEMD-160, but that may not be secure in
|
||||
some cases; for details, see <<p2sh_collision_attacks>>. A result of
|
||||
using SHA256 without RIPEMD160 is that P2WSH commitments are 32 bytes
|
||||
(256 bits) instead 20 bytes (160 bits).
|
||||
|
||||
For the Pay-to-Taproot (P2TR) output, the witness program is a point on
|
||||
the secp256k1 curve. It may be a simple public key, but in most cases
|
||||
it should be a public key that commits to some additional data. We'll
|
||||
learn more about that commitment in <<FIXME_later_chapter_about_taproot>>.
|
||||
|
||||
For the example of a future segwit version, we simply use the highest
|
||||
possible segwit version number (16) and the smallest allowed witness
|
||||
program (2 bytes) with a null value.
|
||||
|
||||
Now that we know the version number and the witness program, we can
|
||||
convert each of them into a bech32 address. Let's use the bech32m reference
|
||||
library for Python to quickly generate those addresses, and then take a
|
||||
deeper look at what's happening:
|
||||
|
||||
----
|
||||
wget https://raw.githubusercontent.com/sipa/bech32/master/ref/python/segwit_addr.py
|
||||
2023-01-30 11:59:10 (46.3 MB/s) - ‘segwit_addr.py’ saved [5022/5022]
|
||||
|
||||
python
|
||||
>>> from segwit_addr import *
|
||||
>>> from binascii import unhexlify
|
||||
|
||||
>>> help(encode)
|
||||
encode(hrp, witver, witprog)
|
||||
Encode a segwit address.
|
||||
|
||||
>>> encode('bc', 0, unhexlify('2b626ed108ad00a944bb2922a309844611d25468'))
|
||||
'bc1q9d3xa5gg45q2j39m9y32xzvygcgay4rgc6aaee'
|
||||
>>> encode('bc', 0, unhexlify('648a32e50b6fb7c5233b228f60a6a2ca4158400268844c4bc295ed5e8c3d626f'))
|
||||
'bc1qvj9r9egtd7mu2gemy28kpf4zefq4ssqzdzzycj7zjhk4arpavfhsct5a3p'
|
||||
>>> encode('bc', 1, unhexlify('2ceefa5fa770ff24f87c5475d76eab519eda6176b11dbe1618fcf755bfac5311'))
|
||||
'bc1p9nh05ha8wrljf7ru236awm4t2x0d5ctkkywmu9sclnm4t0av2vgs4k3au7'
|
||||
>>> encode('bc', 16, unhexlify('0000'))
|
||||
'bc1sqqqqkfw08p'
|
||||
----
|
||||
|
||||
If we open the file +segwit_addr.py+ and look at what the code is doing,
|
||||
the first thing we will notice
|
||||
is the sole difference between bech32 (used for segwit v0) and bech32m
|
||||
(used for later segwit versions) is the constant.
|
||||
|
||||
----
|
||||
BECH32_CONSTANT = 1
|
||||
BECH32M_CONSTANT = 0x2bc830a3
|
||||
----
|
||||
|
||||
Next we notice the code produce the checksum. In the final step of the
|
||||
checksum, the appropriate constant is merged into the value using an xor
|
||||
operation. That single value is the only difference between bech32 and
|
||||
bech32m.
|
||||
|
||||
With the checksum created, each 5-bit character in the data part
|
||||
(including the witness version, witness program, and checksum) is
|
||||
converted to alphanumeric characters.
|
||||
|
||||
For decoding back into a scriptPubKey, we work in reverse. First let's
|
||||
use the reference library to decode two of our addresses:
|
||||
|
||||
----
|
||||
>>> help(decode)
|
||||
decode(hrp, addr)
|
||||
Decode a segwit address.
|
||||
|
||||
>>> _ = decode("bc", "bc1q9d3xa5gg45q2j39m9y32xzvygcgay4rgc6aaee"); _[0], bytes(_[1]).hex()
|
||||
(0, '2b626ed108ad00a944bb2922a309844611d25468')
|
||||
>>> _ = decode("bc", "bc1p9nh05ha8wrljf7ru236awm4t2x0d5ctkkywmu9sclnm4t0av2vgs4k3au7"); _[0], bytes(_[1]).hex()
|
||||
(1, '2ceefa5fa770ff24f87c5475d76eab519eda6176b11dbe1618fcf755bfac5311')
|
||||
----
|
||||
|
||||
We get back both the witness version and the witness program. Those can
|
||||
be inserted into the template for our scriptPubKey:
|
||||
|
||||
----
|
||||
<version> <program>
|
||||
----
|
||||
|
||||
For example:
|
||||
|
||||
----
|
||||
OP_0 2b626ed108ad00a944bb2922a309844611d25468
|
||||
OP_1 2ceefa5fa770ff24f87c5475d76eab519eda6176b11dbe1618fcf755bfac5311
|
||||
----
|
||||
|
||||
[WARNING]
|
||||
====
|
||||
One
|
||||
possible mistake here to be aware of is that a witness version of `0` is
|
||||
for `OP_0`, which uses the byte 0x00--but a witness version of `1` uses
|
||||
`OP_1`, which is byte 0x51. Witness versions `2` through `16` use 0x52
|
||||
through 0x60, respectively.
|
||||
====
|
||||
|
||||
When implementing bech32m encoding or decoding, we very strongly
|
||||
recommend that you use the test vectors provided in BIP350. We also ask
|
||||
that you ensure your code passes the test vectors related to paying future segwit
|
||||
versions that haven't been defined yet. This will help make your
|
||||
software usable for many years to come even if you aren't able to add
|
||||
support for new Bitcoin features as soon as they become available.
|
||||
|
||||
==== Key Formats
|
||||
|
||||
|
BIN
images/bech32-qrcode-uc-lc.png
Normal file
BIN
images/bech32-qrcode-uc-lc.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 40 KiB |
BIN
images/bech32m-typo-detection.png
Normal file
BIN
images/bech32m-typo-detection.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 23 KiB |
Loading…
Reference in New Issue
Block a user