1
0
mirror of https://github.com/bitcoinbook/bitcoinbook synced 2025-01-09 07:10:56 +00:00

CH11: edits for Murchandamus feedback

- Remove redundancy in description of the prevBlockHash field in
  creating a chain of blocks.

- Upsize numbers to segwit block limits

- Use "target" instead of "difficulty" when appropriate.

- Drop paragraph that repeats following table about block header fields

- Correct multiple parts of table about block header fields

- The genesis block is not the first block ever created: there were
  almost certainly test blocks created before it

- Use "||" for concatenation.  Left FIXMEs to update images later.

- Add short note about duplicating internal merkle tree nodes before
  hashing when an odd number are present.  Add long quote from Bitcoin
  Core source about why Bitcoin's merkle tree design should not be
  replicated by other projects.

- Drop table for illustrating what log2(N) looks like.  Add FIXME to add
  a plot.

- Drop details about previous testnet resets.

- Mention that testnets use different address prefixes.
This commit is contained in:
David A. Harding 2023-08-02 09:55:17 -10:00
parent e0932e89f7
commit 9bfea66f45

View File

@ -2,14 +2,14 @@
== The Blockchain
The blockchain is the history of every confirmed Bitcoin transaction.
It's what allows every full node to indepdently determine what keys and
It's what allows every full node to independently determine what keys and
scripts control which bitcoins. In this chapter, we'll look at the
structure of the blockchain and see how it uses cryptographic
commitments and other clever tricks to make every part of it easy for
full nodes (and sometimes light clients) to validate.
((("blockchain (the)", "overview of")))The blockchain data structure is
an ordered, back-linked list of blocks of transactions.The blockchain
an ordered, back-linked list of blocks of transactions. The blockchain
can be stored as a flat file, or in a simple database.
Blocks are linked "back," each referring to the previous block in the
chain. ((("blocks", "block height")))The blockchain is often visualized
@ -24,8 +24,7 @@ blocks stacked on top of each other results in the use of terms such as
within the blockchain is identified by a hash, generated using the
SHA256 cryptographic hash algorithm on the header of the block. Each
block also commits to the previous block, known as the _parent_ block,
through the "previous block hash" field in the block header. In other
words, each block contains the hash of its parent inside its own header.
through the "previous block hash" field in the block header.
The sequence of hashes linking each block to its parent creates a chain
going back all the way to the first block ever created, known as the
_genesis block_.
@ -36,18 +35,13 @@ Multiple children arise during a blockchain "fork," a temporary
situation that can occur when different blocks are discovered almost
simultaneously by different miners (see <<forks>>). Eventually, only one
child block becomes part of the blockchain accepted by all full nodes and the "fork" is resolved.
Even though a block may have more than one child, each block can have
only one parent. This is because a block has one single "previous block
hash" field referencing its single parent.
The "previous block hash" field is inside the block header and thereby
affects the _current_ block's hash. The child's own identity changes if
the parent's identity changes. When the parent is modified in any way,
the parent's hash changes. The parent's changed hash necessitates a
change in the "previous block hash" pointer of the child. This in turn
causes the child's hash to change, which requires a change in the
affects the _current_ block's hash.
Any change to a parent block
requires a child block's hash to change, which requires a change in the
pointer of the grandchild, which in turn changes the grandchild, and so
on. This cascade effect ensures that once a block has many generations
on. This sequence ensures that, once a block has many generations
following it, it cannot be changed without forcing a recalculation of
all subsequent blocks. Because such a recalculation would require
enormous computation (and therefore energy consumption), the existence
@ -67,8 +61,8 @@ blockchain, beyond six blocks, blocks are less and less likely to
change. ((("transactions", "coinbase transactions")))((("coinbase
transactions")))After 100 blocks back there is so much stability that
the coinbase transaction--the transaction containing the reward in
bitcoin for creating a new block--can be spent. A few thousand blocks back (a month) and the
blockchain is settled history for all practical purposes. While the
bitcoin for creating a new block--can be spent.
While the
protocol always allows a chain to be undone by a longer chain and while
the possibility of any block being reversed always exists, the
probability of such an event decreases as time passes until it becomes
@ -83,7 +77,7 @@ block is made of a header, containing metadata, followed by a long list
of transactions that make up the bulk of its size. The block header is
80 bytes, whereas the total size of all transactions in a block can be
up to about 4,000,000 bytes. A complete block,
with all transactions, can therefore be over 10,000 times larger than the block
with all transactions, can therefore be over 50,000 times larger than the block
header. <<block_structure1>> describes how Bitcoin Core stores the structure of a block.
[[block_structure1]]
@ -94,7 +88,7 @@ header. <<block_structure1>> describes how Bitcoin Core stores the structure of
|Size| Field | Description
| 4 bytes | Block Size | The size of the block, in bytes, following this field
| 80 bytes | Block Header | Several fields form the block header
| 1-9 bytes (VarInt) | Transaction Counter | How many transactions follow
| 1-9 bytes (compactSize) | Transaction Counter | How many transactions follow
| Variable | Transactions | The transactions recorded in this block
|=======
@ -102,31 +96,23 @@ header. <<block_structure1>> describes how Bitcoin Core stores the structure of
=== Block Header
((("blocks", "headers")))((("blockchain (the)", "block
headers")))((("headers")))The block header consists of three sets of
block metadata. First, there is a reference to a previous block hash,
which connects this block to the previous block in the blockchain. The
second set of metadata, namely the _difficulty_, _timestamp_, and
_nonce_, relate to the mining competition, as detailed in <<mining>>.
The third piece of metadata is the merkle tree root, a data structure
used to efficiently summarize all the transactions in the block.
<<block_header_structure_ch09>> describes the structure of a block
header.
headers")))((("headers")))The block header consists of
block metadata as shown in <<block_header_structure_ch09>>.
[[block_header_structure_ch09]]
.The structure of the block header
[options="header"]
|=======
|Size| Field | Description
| 4 bytes | Version | A version number to track protocol upgrades
| 32 bytes | Previous Block Hash | A reference to the hash of the previous (parent) block in the chain
| 32 bytes | Merkle Root | A hash of the root of the merkle tree of this block's transactions
| 4 bytes | Timestamp | The approximate creation time of this block (seconds from Unix Epoch)
| 4 bytes | Difficulty Target | The Proof-of-Work algorithm difficulty target for this block
| 4 bytes | Nonce | A counter used for the Proof-of-Work algorithm
| 4 bytes | Version | Originally a version field; its use has evolved over time
| 32 bytes | Previous Block Hash | A hash of the previous (parent) block in the chain
| 32 bytes | Merkle Root | The root hash of the merkle tree of this block's transactions
| 4 bytes | Timestamp | The approximate creation time of this block (Unix epoch time)
| 4 bytes | Target | A compact encoding of the Proof-of-Work target for this block
| 4 bytes | Nonce | Arbitrary data used for the Proof-of-Work algorithm
|=======
The nonce, difficulty target, and timestamp are used in the mining
The nonce, target, and timestamp are used in the mining
process and will be discussed in more detail in <<mining>>.
[[block_hash]]
@ -140,13 +126,13 @@ hash is called the _block hash_ but is more accurately the _block header
hash_, pass:[<span role="keep-together">because only the block header is
used to compute it. For example,</span>]
+000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f+ is
the block hash of the first Bitcoin block ever created. The block hash
the block hash of the first block on Bitcoin's block chain. The block hash
identifies a block uniquely and unambiguously and can be independently
derived by any node by simply hashing the block header.
Note that the block hash is not actually included inside the block's
data structure, neither when the block is transmitted on the network,
nor when it is stored on a node's persistant storage as part of the
nor when it is stored on a node's persistent storage as part of the
blockchain. Instead, the block's hash is computed by each node as the
block is received from the network. The block hash might be stored in a
separate database table as part of the block's metadata, to facilitate
@ -154,7 +140,7 @@ indexing and faster retrieval of blocks from disk.
A second way to identify a block is by its position in the blockchain,
called the pass:[<span role="keep-together"><em>block height</em>. The
first block ever created is at block height 0 (zero) and is the</span>]
genesis block is at block height 0 (zero) and is the</span>]
pass:[<span role="keep-together">same block that was previously
referenced by the following block hash</span>]
+000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f+. A
@ -176,7 +162,7 @@ also not a part of the block's data structure; it was not stored within
the block. Each node dynamically identified a block's position (height)
in the blockchain when it was received from the Bitcoin network. A
later protocol change (BIP34) began including the block height in the
coinbase transaction, although it's purpose was to ensure each block had
coinbase transaction, although its purpose was to ensure each block had
a different coinbase transaction. Nodes still need to dynamically
identify a block's height in order to validate the coinbase field. The
block height might also be stored as metadata in an indexed database
@ -186,7 +172,7 @@ table for faster retrieval.
====
A block's _block hash_ always identifies a single block uniquely. A
block also always has a specific _block height_. However, it is not
always the case that a specific block height can identify a single
always the case that a specific block height identifies a single
block. Rather, two or more blocks might compete for a single position in
the blockchain.
====
@ -256,7 +242,7 @@ $ bitcoin-cli getblock 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60
The genesis block contains a message within it. The coinbase
transaction input contains the text "The Times 03/Jan/2009 Chancellor on
brink of second bailout for banks." This message was intended to offer
proof of the earliest date this block was created, by referencing the
proof of the earliest date this block could have been created, by referencing the
headline of the British newspaper _The Times_. It also serves as a
tongue-in-cheek reminder of the importance of an independent monetary
system, with Bitcoin's launch occurring at the same time as an
@ -267,10 +253,10 @@ first block by Satoshi Nakamoto, Bitcoin's creator.
((("blocks", "linking blocks in the blockchain")))((("blockchain (the)",
"linking blocks in the blockchain")))Bitcoin full nodes validate every
block in the blockchain, starting at the genesis block. Their local view of
block in the blockchain after the genesis block. Their local view of
the blockchain is constantly updated as new blocks are found and used to
extend the chain. As a node receives incoming blocks from the network,
it will validate these blocks and then link them to the existing
it will validate these blocks and then link them to its view of the existing
blockchain. To establish a link, a node will examine the incoming block
header and look for the "previous block hash."
@ -299,9 +285,7 @@ parses as follows:
"nonce" : 4215469401,
"tx" : [
"257e7497fb8bc68421eb2c7b699dbab234831600e7352f0d9e6522c7cf3f6c77",
#[... many more transactions omitted ...]
"[... many more transactions omitted ...]",
"05cfd38f6ae6aa83674cc99e4d75a1458c165b7ab84725eda41d018a09176634"
]
}
@ -318,7 +302,7 @@ references in the +previousblockhash+ field.
[[chain_of_blocks]]
[role="smallerfourtyfive"]
.Blocks linked in a chain by reference to the previous block header hash
.Blocks linked in a chain by each referencing the previous block header hash
image::images/mbc2_0901.png[]
[[merkle_trees]]
@ -348,7 +332,7 @@ double-SHA256.
When N data elements are hashed and summarized in a merkle tree, you can
check to see if any one data element is included in the tree with at
most +2*log~2~(N)+ calculations, making this a very efficient data
about +log~2~(N)+ calculations, making this a very efficient data
structure.
The merkle tree is constructed bottom-up. In the following example, we
@ -372,7 +356,7 @@ double-hashed to produce the parent node's hash:
++++
<pre data-type="codelisting">
H<sub>AB</sub> = SHA256(SHA256(H<sub>A</sub> + H<sub>B</sub>))
H<sub>AB</sub> = SHA256(SHA256(H<sub>A</sub> || H<sub>B</sub>))
</pre>
++++
@ -382,6 +366,7 @@ header and summarizes all the data in all four transactions.
<<simple_merkle>> shows how the root is calculated by pair-wise hashes
of the nodes.
//FIXME: s/+/||/
[[simple_merkle]]
.Calculating the nodes in a merkle tree
image::images/mbc2_0902.png["merkle_tree"]
@ -391,11 +376,63 @@ an even number of leaf nodes. If there is an odd number of transactions
to summarize, the last transaction hash will be duplicated to create an
even number of leaf nodes, also known as a _balanced tree_. This is
shown in <<merkle_tree_odd>>, where transaction C is duplicated.
Similarly, if there are an odd number of hashes to process at any level,
the last hash is duplicated.
//FIXME: s/+/||/
[[merkle_tree_odd]]
.Duplicating one data element achieves an even number of data elements
image::images/mbc2_0903.png["merkle_tree_odd"]
.A design flaw in Bitcoin's merkle tree
****
An extended comment in Bitcoin Core's source code describes a
significant problems in the design of Bitcoin's duplication of odd
elements in its merkle tree:
[quote,Bitcoin Core src/consensus/merkle.cpp]
____
WARNING! If you're reading this because you're learning about crypto
and/or designing a new system that will use merkle trees, keep in mind
that the following merkle tree algorithm has a serious flaw related to
duplicate txids, resulting in a vulnerability (CVE-2012-2459).
The reason is that if the number of hashes in the list at a given level
is odd, the last one is duplicated before computing the next level (which
is unusual in Merkle trees). This results in certain sequences of
transactions leading to the same merkle root. For example, these two
trees:
//FIXME:replace with image to fix italics in code text
----
A A
/ \ / \
B C B C
/ \ | / \ / \
D E F D E F F
/ \ / \ / \ / \ / \ / \ / \
1 2 3 4 5 6 1 2 3 4 5 6 5 6
----
for transaction lists [1,2,3,4,5,6] and [1,2,3,4,5,6,5,6] (where 5 and
6 are repeated) result in the same root hash A (because the hash of both
of (F) and (F,F) is C).
The vulnerability results from being able to send a block with such a
transaction list, with the same merkle root, and the same block hash as
the original without duplication, resulting in failed validation. If the
receiving node proceeds to mark that block as permanently invalid
however, it will fail to accept further unmodified (and thus potentially
valid) versions of the same block. We defend against this by detecting
the case where we would hash two identical hashes at the end of the list
together, and treating that identically to the block having an invalid
merkle root. Assuming no double-SHA256 collisions, this will detect all
known ways of changing the transactions without affecting the merkle
root.
____
****
The same method for constructing a tree from four transactions can be
generalized to construct trees of any size. In Bitcoin it is common to
have several thousand transactions in a single
@ -408,7 +445,7 @@ thousand transactions in the block, the merkle root always summarizes
them into 32 bytes.
((("authentication paths")))To prove that a specific transaction is
included in a block, a node only needs to produce +log~2~(N)+ 32-byte
included in a block, a node only needs to produce approximately +log~2~(N)+ 32-byte
hashes, constituting an _authentication path_ or _merkle path_
connecting the specific transaction to the root of the tree. This is
especially important as the number of transactions increases, because
@ -425,7 +462,7 @@ image::images/mbc2_0904.png["merkle_tree_large"]
In <<merkle_tree_path>>, a node can prove that a transaction K is
included in the block by producing a merkle path that is only four
32-byte hashes long (128 bytes total). The path consists of the four
hashes (shown with a shaded background H~L~,
hashes (shown with a shaded background) H~L~,
H~IJ~, H~MNOP~, and H~ABCDEFGH~. With those four hashes provided as an
authentication path, any node can prove that H~K~ (with a black
background at the bottom of the diagram) is included in the merkle root
@ -441,6 +478,7 @@ The efficiency of merkle trees becomes obvious as the scale increases.
<<block_structure2>> shows the amount of data that needs to be exchanged
as a merkle path to prove that a transaction is part of a block.
//FIXME: replace with a plot of size per txes: plot [0:16000] ceil(log2(x))
[[block_structure2]]
.Merkle tree efficiency
[options="header"]
@ -452,18 +490,15 @@ as a merkle path to prove that a transaction is part of a block.
| 65,535 transactions | 16 megabytes | 16 hashes | 512 bytes
|=======
As you can see from the table, while the block size increases rapidly,
from 4 KB with 16 transactions to a block size of 16 MB to fit 65,535
transactions, the merkle path required to prove the inclusion of a
transaction increases much more slowly, from 128 bytes to only 512
bytes. With merkle trees, a node can download just the block headers (80
With merkle trees, a node can download just the block headers (80
bytes per block) and still be able to identify a transaction's inclusion
in a block by retrieving a small merkle path from a full node, without
storing or transmitting the vast majority of the blockchain, which might
be several gigabytes in size. Clients that do not maintain a full
storing or transmitting the vast majority of the blockchain.
Clients that do not maintain a full
blockchain, called simplified payment verification (SPV) client, use
merkle paths to verify transactions without downloading full blocks.
//FIXME: stretch goal to minimize use of "SPV" in the book
=== Merkle Trees and Simplified Payment Verification (SPV)
((("simple-payment-verification (SPV)")))((("bitcoin nodes", "SPV
@ -471,7 +506,7 @@ clients")))Merkle trees are used extensively by SPV clients. SPV clients don't
have all transactions and do not download full blocks, just block
headers. In order to verify that a transaction is included in a block,
without having to download all the transactions in the block, they use
an authentication path, or merkle path.
a merkle path.
Consider, for example, an SPV client that is interested in incoming
payments to an address contained in its wallet. The SPV client will
@ -503,40 +538,37 @@ chapter, is called _mainnet_. There are other Bitcoin blockchains that
are used for testing purposes: at this time _testnet_, _signet_, and
_regtest_. Let's look at each in turn.((("testnet", id="testnet09")))
==== Testnet: Bitcoin's Testing Playground
Testnet is the name of the test blockchain, network, and currency that
is used for testing purposes. The testnet is a fully featured live P2P
network, with wallets, test bitcoins (testnet coins), mining, and all
the other features of mainnet. There are really only two differences:
testnet coins are meant to be worthless and mining difficulty should be
low enough that anyone can mine testnet coins relatively easily (keeping
them worthless).
the other features of mainnet. The most important difference is that
testnet coins are meant to be worthless.
Any software development that is intended for production use on
Bitcoin's mainnet can first be tested on testnet with test coins.
This protects both the developers from monetary losses due to bugs and
the network from unintended behavior due to bugs.
Keeping the coins worthless and the mining easy, however, is not easy.
Despite pleas from developers, some people use advanced mining equipment
(GPUs and ASICs) to mine on testnet. This increases the difficulty,
makes it impossible to mine with a CPU, and eventually makes it
difficult enough to get test coins that people start valuing them, so
they're not worthless. As a result, every now and then, the testnet has
to be scrapped and restarted from a new genesis block, resetting the
difficulty.
The current testnet is called _testnet3_, the third iteration of
testnet, restarted in February 2011 to reset the difficulty from the
previous testnet.
Keep in mind that testnet3 is a large blockchain, in excess of 30 GB in
previous testnet. Testnet3 is a large blockchain, in excess of 30 GB in
2023. It will take a while to sync fully and use up resources
on your computer. Not as much as mainnet, but not exactly "lightweight"
either.
[TIP]
====
Testnet and the other test blockchains described in this book don't use
the same address prefixes as mainnet addresses to prevent someone from
accidentally sending real bitcoins to a test address. Mainnet addresses
begin with +1+, +3+, or +bc1+. Addresses for the test networks
mentioned in this book begin with +m+, +n+, or +tb1+. Other test
networks, or new protocols being developed on test networks, may use
other address prefixes or alterations.
====
===== Using testnet
Bitcoin Core, like many other Bitcoin programs, has full support