merkle path and SPV, few changes to intro

pull/66/head
Andreas M. Antonopoulos 10 years ago
parent 95cef90b8c
commit 7a35a6ea04

@ -6,11 +6,15 @@
[[blockchain]]
=== The Blockchain
The blockchain data structure is an ordered linked list of blocks of transactions. The blockchain can be stored as a flat file, or in a simple database. The bitcoin core client stores the blockchain metadata using Google's LevelDB database.
The blockchain data structure is an ordered back-linked list of blocks of transactions. The blockchain can be stored as a flat file, or in a simple database. The bitcoin core client stores the blockchain metadata using Google's LevelDB database. Blocks are linked "back", each referring to the previous block in the chain. The blockchain is often visualized as a vertical stack, with blocks layered on top of each other and the first block ever serving as the foundation of the stack. The visualization of blocks stacked on top of each other results in the use of terms like "height", to refer to the distance from the first block, and "top" or "tip" to refer to the most recently added block.
Each block within the blockchain is identified by a hash, generated using the SHA256 cryptographic hash algorithm on the header of the block. Each block also references a previous block, known as the _parent_ block, through the "previous block hash" field in the block header. In other words, each block contains the hash of its parent inside its own header. The sequence of hashes linking each block to its parent, creates a chain going back all the way to the first block ever created, known as the _genesis block_. A block can have multiple children, each of which refers to it as its parent and contains the same hash in the "previous block hash" field. However, since a block only hash one single "previous block hash", that means each block can only have one parent. {one of which will win out while the others die...}
Each block within the blockchain is identified by a hash, generated using the SHA256 cryptographic hash algorithm on the header of the block. Each block also references a previous block, known as the _parent_ block, through the "previous block hash" field in the block header. In other words, each block contains the hash of its parent inside its own header. The sequence of hashes linking each block to its parent, creates a chain going back all the way to the first block ever created, known as the _genesis block_.
The hash of the parent is also part of the data (the block header) that creates the hash of the child, making the ancestry of each block an immutable part of its identity. The chain of hashes guarantees that a block cannot be modified retrospectively without forcing the re-computation of all following blocks (children), because a retrospective change in any block would change its hash, thereby changing the reference in any children whose hash also changes, and so on. As new blocks are added to the chain, they strengthen the immutability of the ledger by effectively incorporating all previous blocks by reference in their cryptographic hash.
While a block has just one parent, it can temporarily have multiple children. Each of the children refers to the same block as its parent and contains the same (parent) hash in the "previous block hash" field. Multiple children arise during a blockchain "fork", a temporary situation that occurs when different blocks are discovered almost simultaneously by different miners (see <<forks>>). Eventually, only one child block becomes part of the blockchain and the "fork" is resolved. Even though a block may have more than one child, each block can have only one parent. This is because a block has one single "previous block hash" field referencing its single parent.
The "previous block hash" field is inside the block header and thereby affects the _current_ block's hash. The child's own identity changes if the parent's identity changes. When the parent is modified in any way, the parent's hash changes. The parent's changed hash necessitates a change in the "previous block hash" pointer of the child. This in turn causes the child's hash to change, which requires a change in the pointer of the grandchild, which in turn changes the grandchild and so on. This cascade effect ensures that once a block has many generations following it, it cannot be changed without forcing a recalculation of all subsequent blocks. Because such a recalculation would require enormous computation, the existence of a long chain of blocks makes the blockchain's deep history immutable, a key feature of bitcoin's security.
One way to think about the blockchain is like layers in a geological formation, or glacier core sample. The surface layers may change with the seasons, or even be blown away before they have time to settle. But once you go a few inches deep, geological layers become more and more stable. By the time you look a few hundred feet down, you are looking at a snapshot of the past that has remained undisturbed for millennia or millions of years. In the blockchain, the most recent few blocks may be revised if there is a chain recalculation due to a fork. The top six blocks are like a few inches of topsoil. But once you go deeper into the blockchain, beyond 6 blocks, blocks are less and less likely to change. After 100 blocks back there is so much stability that the "coinbase" transaction, containing new bitcoin, can be spent. A few thousand blocks back (a month) and the blockchain is settled history. It will never change.
=== Structure of a Block
@ -52,9 +56,9 @@ The Nonce, Difficulty Target and Timestamp are used in the mining process and wi
The primary identifier of a block is its cryptographic hash, a digital fingerprint, made by hashing the block header twice through the SHA256 algorithm. The resulting 32-byte hash, is called the _block hash_, but is more accurately the _block *header* hash_, as only the block header is used to compute it. For example, the block hash of the first bitcoin block ever created is +000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f+. The block hash identifies a block uniquely and unambiguously and can be independently derived by any node by simply hashing the block header.
Note that the block hash is not actually included inside the block's data structure, neither when the block is transmitted on the network, nor when it is stored on a node's persistence storage as part of the blockchain. Instead, the block's hash is computed by each node as the block is received from the network. On full nodes, the block hash may be stored in a separate database table as part of the block's metadata, to facilitate indexing and faster retrieval of block from disk.
Note that the block hash is not actually included inside the block's data structure, neither when the block is transmitted on the network, nor when it is stored on a node's persistence storage as part of the blockchain. Instead, the block's hash is computed by each node as the block is received from the network. The block hash may be stored in a separate database table as part of the block's metadata, to facilitate indexing and faster retrieval of blocks from disk.
A second way to identify a block is by its position in the blockchain, called the _block height_. The first block ever created is at block height 0 (zero), and is the same block that was referenced by the block hash +000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f+ {above - CHECK TO SEE IF ABOVE OR BELOW}. A block can thus be identified two ways, either by referencing the block hash, or by referencing the block height. Each subsequent block added "on top" of that first block is one position "higher" in the blockchain, like boxes stacked one on top of the other. The block height on January 1st 2014 was approximately 278,000, meaning there were 278,000 blocks stacked on top of the first block created in January 2009.
A second way to identify a block is by its position in the blockchain, called the _block height_. The first block ever created is at block height 0 (zero), and is the same block that was referenced by the block hash +000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f+ above. A block can thus be identified two ways, either by referencing the block hash, or by referencing the block height. Each subsequent block added "on top" of that first block is one position "higher" in the blockchain, like boxes stacked one on top of the other. The block height on January 1st 2014 was approximately 278,000, meaning there were 278,000 blocks stacked on top of the first block created in January 2009.
Unlike the block hash, the block height is not a unique identifier. While a single block will always have a specific and invariant block height, the reverse is not true - the block height does not always identify a single block. Two or more blocks may have the same block height, competing for the same position in the blockchain. This scenario is discussed in detail in the section on <<forks>>. The block height is also not a part of the block's data structure, it is not stored within the block. Each node dynamically identifies a block's position (height) in the blockchain when it is received from the bitcoin network. The block height may also be stored as metadata in an indexed database table for faster retrieval.
@ -106,7 +110,7 @@ The genesis block contains a hidden message within it. The coinbase transaction
Bitcoin nodes maintain a local copy of the blockchain, starting at the genesis block. The local copy of the blockchain is constantly updated as new blocks are found and used to extend the chain. As a node receives incoming blocks from the network, it will validate these blocks and then link them to the existing blockchain. To establish a link, a node will examine the incoming block header and look for the "previous block hash".
Let's assume for example that a node has 277,314 blocks in the local copy of the blockchain. The last block the node knows about is block 277314, with a block header hash of +00000000000000027e7ba6fe7bad39faf3b5a83daed765f05f7d1b71a1632249+.
Let's assume for example that a node has 277,314 blocks in the local copy of the blockchain. The last block the node knows about is block 277,314, with a block header hash of +00000000000000027e7ba6fe7bad39faf3b5a83daed765f05f7d1b71a1632249+.
The bitcoin node then receives a new block from the network, which it parses as follows:
----
@ -130,7 +134,7 @@ The bitcoin node then receives a new block from the network, which it parses as
}
----
Looking at this new block, the node finds the "previousblockhash" field, which contains the hash of its parent block. It is a hash known to the node, that of the last block on the chain at height 277314. Therefore, this new block is a child of the last block on the chain and extends the existing blockchain. The node adds this new block to the end of the chain, making the blockchain longer with a new height of 277,315.
Looking at this new block, the node finds the "previousblockhash" field, which contains the hash of its parent block. It is a hash known to the node, that of the last block on the chain at height 277,314. Therefore, this new block is a child of the last block on the chain and extends the existing blockchain. The node adds this new block to the end of the chain, making the blockchain longer with a new height of 277,315.
[[chain_of_blocks]]
.Blocks linked in a chain, by reference to the previous block header hash
@ -192,5 +196,13 @@ The efficiency of merkle trees becomes obvious as the scale increases. For examp
| 65,535 transactions | 16 megabytes | 16 hashes | 512 bytes
|=======
As you can see from the table above, while the block size increases rapidly, from 4KB with 16 transactions to a block size of 16 MB to fit 65,535 transactions, the merkle path required to prove the inclusion of a transaction increases much more slowly, from 128 bytes to only 512 bytes. With merkle trees, a node can download just the block headers (80 bytes per block) and still be able to identify a transaction's inclusion in a block by retrieving a small merkle path from a full node, without storing or transmitting the vast majority of the blockchain which may be several gigabytes in size. Nodes which do not maintain a full blockchain, called Simple Payment Verification or SPV nodes use merkle paths to verify transactions without downloading full blocks. s
As you can see from the table above, while the block size increases rapidly, from 4KB with 16 transactions to a block size of 16 MB to fit 65,535 transactions, the merkle path required to prove the inclusion of a transaction increases much more slowly, from 128 bytes to only 512 bytes. With merkle trees, a node can download just the block headers (80 bytes per block) and still be able to identify a transaction's inclusion in a block by retrieving a small merkle path from a full node, without storing or transmitting the vast majority of the blockchain which may be several gigabytes in size. Nodes which do not maintain a full blockchain, called Simple Payment Verification or SPV nodes use merkle paths to verify transactions without downloading full blocks.
=== Merkle Trees and Simple Payment Verification (SPV)
Merkle trees are used extensively by Simple Payment Verification nodes. SPV nodes don't have all transactions and do not download full blocks, just block headers. In order to verify that a transaction is included in a block, without having to download all the transactions in the block, they use an _authentication path_, or merkle path.
Consider for example an SPV node that is interested in incoming payments to an address contained in its wallet. The SPV node will establish a bloom filter on its connections to peers to limit the transactions received to only those containing addresses of interest. When a peer sees a transaction that matches the bloom filter, it will send that block using a +merkleblock+ message. The +merkleblock+ message contains the block header as well as a merkle path that links the transaction of interest to the merkle root in the block. The SPV node can use this merkle path to connect the transaction to the block and verify that the transaction is included in the block. The SPV node also uses the block header to link the block to the rest of the blockchain. The combination of these two links, between the transaction and block, and between the block and blockchain, proves that the transaction is recorded in the blockchain. All in all, the SPV node will have received less than a kilobyte of data for the block header and merkle path, an amount of data that is a thousand times less than a full block (about 1 megabyte currently)

Loading…
Cancel
Save