diff --git a/ch06.asciidoc b/ch06.asciidoc index b96b5125..a9270690 100644 --- a/ch06.asciidoc +++ b/ch06.asciidoc @@ -67,16 +67,16 @@ The peer node responds with +verack+ to acknowledge and establish a connection, .The initial handshake between peers image::images/msbt_0604.png["NetworkHandshake"] -How does a new node find peers? Although there are no special nodes in bitcoin, there are some long running stable nodes that are listed in the client as((("nodes","seed")))((("seed nodes"))) _seed nodes_. Although a new node does not have to connect with the seed nodes, it can use them to quickly discover other nodes in the network. In the Bitcoin Core client, the option to use the seed nodes is controlled by the option switch +-dnsseed+, which is set to 1, to use the seed nodes, by default. Alternatively, a bootstrapping node that knows nothing of the network must be given the IP address of at least one bitcoin node, after which it can establish connections through further introductions. The command-line argument +-seednode+ can be used to connect to one node just for introductions, using it as a DNS seed. After the initial seed node is used to form introductions, the client will disconnect from it and use the newly discovered peers. +How does a new node find peers? Although there are no special nodes in bitcoin, there are some long-running stable nodes that are listed in the client as((("nodes","seed")))((("seed nodes"))) _seed nodes_. Although a new node does not have to connect with the seed nodes, it can use them to quickly discover other nodes in the network. In the Bitcoin Core client, the option to use the seed nodes is controlled by the option switch +-dnsseed+, which is set to 1, to use the seed nodes, by default. Alternatively, a bootstrapping node that knows nothing of the network must be given the IP address of at least one bitcoin node, after which it can establish connections through further introductions. The command-line argument +-seednode+ can be used to connect to one node just for introductions, using it as a DNS seed. After the initial seed node is used to form introductions, the client will disconnect from it and use the newly discovered peers. -Once one or more connections are established, the new node will send an((("addr message"))) +addr+ message containing its own IP address, to its neighbors. The neighbors will, in turn, forward the +addr+ message to their neighbors, ensuring that the newly connected node becomes well known and better connected. Additionally, the newly connected node can send +getaddr+ to the neighbors, asking them to return a list of IP addresses of other peers. That way, a node can find peers to connect to and advertise its existence on the network for other nodes to find it. <> shows the address discovery protocol. +Once one or more connections are established, the new node will send an((("addr message"))) +addr+ message containing its own IP address to its neighbors. The neighbors will, in turn, forward the +addr+ message to their neighbors, ensuring that the newly connected node becomes well known and better connected. Additionally, the newly connected node can send +getaddr+ to the neighbors, asking them to return a list of IP addresses of other peers. That way, a node can find peers to connect to and advertise its existence on the network for other nodes to find it. <> shows the address discovery protocol. [[address_propagation]] .Address propagation and discovery image::images/msbt_0605.png["AddressPropagation"] -A node must connect to a few different peers in order to establish diverse paths into the bitcoin network. Paths are not reliable, nodes come and go, and so the node must continue to discover new nodes as it loses old connections as well as assist other nodes when they bootstrap. Only one connection is needed to bootstrap, because the first node can offer introductions to its peer nodes and those peers can offer further introductions. It's also unnecessary and wasteful of network resources to connect to more than a handful of nodes. After bootstrapping, a node will remember its most recent successful peer connections, so that if it is rebooted it can quickly reestablish connections with its former peer network. If none of the former peers respond to its connection request, the node can use the seed nodes to bootstrap again. +A node must connect to a few different peers in order to establish diverse paths into the bitcoin network. Paths are not reliable—nodes come and go—and so the node must continue to discover new nodes as it loses old connections as well as assist other nodes when they bootstrap. Only one connection is needed to bootstrap, because the first node can offer introductions to its peer nodes and those peers can offer further introductions. It's also unnecessary and wasteful of network resources to connect to more than a handful of nodes. After bootstrapping, a node will remember its most recent successful peer connections, so that if it is rebooted it can quickly reestablish connections with its former peer network. If none of the former peers respond to its connection request, the node can use the seed nodes to bootstrap again. On a node running the Bitcoin Core client, you can list the peer connections with the command((("getpeerinfo command"))) +getpeerinfo+: @@ -128,44 +128,44 @@ If there is no traffic on a connection, nodes will periodically send a message t === Full Nodes -((("blockchains","full nodes and")))((("full nodes")))((("nodes","full")))Full nodes are nodes that maintain a full blockchain with all transactions. More accurately, they probably should be called "full blockchain nodes." In the early years of bitcoin, all nodes were full nodes and currently the Bitcoin Core client is a full blockchain node. In the past two years, however, new forms of bitcoin clients have been introduced that do not maintain a full blockchain but run as lightweight clients. These are examined in more detail in the next section. +((("block chains","full nodes and")))((("full nodes")))((("nodes","full")))Full nodes are nodes that maintain a full block chain with all transactions. More accurately, they probably should be called "full block chain nodes." In the early years of bitcoin, all nodes were full nodes and currently the Bitcoin Core client is a full block chain node. In the past two years, however, new forms of bitcoin clients have been introduced that do not maintain a full block chain but run as lightweight clients. We'll examine these in more detail in the next section. -((("blockchains","on full nodes")))Full blockchain nodes maintain a complete and up-to-date copy of the bitcoin blockchain with all the transactions, which they independently build and verify, starting with the very first block (genesis block) and building up to the latest known block in the network. A full blockchain node can independently and authoritatively verify any transaction without recourse or reliance on any other node or source of information. The full blockchain node relies on the network to receive updates about new blocks of transactions, which it then verifies and incorporates into its local copy of the blockchain. +((("block chains","on full nodes")))Full block chain nodes maintain a complete and up-to-date copy of the bitcoin block chain with all the transactions, which they independently build and verify, starting with the very first block (genesis block) and building up to the latest known block in the network. A full block chain node can independently and authoritatively verify any transaction without recourse or reliance on any other node or source of information. The full block chain node relies on the network to receive updates about new blocks of transactions, which it then verifies and incorporates into its local copy of the block chain. -Running a full blockchain node gives you the pure bitcoin experience: independent verification of all transactions without the need to rely on, or trust, any other systems. It's easy to tell if you're running a full node because it requires 20+ gigabytes of persistent storage (disk space) to store the full blockchain. If you need a lot of disk and it takes two to three days to "sync" to the network, you are running a full node. That is the price of complete independence and freedom from central authority. +Running a full block chain node gives you the pure bitcoin experience: independent verification of all transactions without the need to rely on, or trust, any other systems. It's easy to tell if you're running a full node because it requires 20+ gigabytes of persistent storage (disk space) to store the full block chain. If you need a lot of disk and it takes two to three days to sync to the network, you are running a full node. That is the price of complete independence and freedom from central authority. -There are a few alternative implementations of full-blockchain bitcoin clients, built using different programming languages and software architectures. However, the most common implementation is the reference client((("Bitcoin Core client","and full nodes"))) Bitcoin Core, also known as the Satoshi client. More than 90% of the nodes on the bitcoin network run various versions of Bitcoin Core. It is identified as "Satoshi" in the sub-version string sent in the +version+ message and shown by the command +getpeerinfo+ as we saw earlier; for example, +/Satoshi:0.8.6/+. +There are a few alternative implementations of full block chain bitcoin clients, built using different programming languages and software architectures. However, the most common implementation is the reference client((("Bitcoin Core client","and full nodes"))) Bitcoin Core, also known as the Satoshi client. More than 90% of the nodes on the bitcoin network run various versions of Bitcoin Core. It is identified as "Satoshi" in the sub-version string sent in the +version+ message and shown by the command +getpeerinfo+ as we saw earlier; for example, +/Satoshi:0.8.6/+. === Exchanging "Inventory" -((("blockchains","creating on nodes")))((("blockchains","on new nodes")))((("blocks","on new nodes")))((("full nodes","creating full blockchains on")))The first thing a full node will do once it connects to peers is try to construct a complete blockchain. If it is a brand-new node and has no blockchain at all, it only knows one block (the genesis block), which is statically embedded in the client software. Starting with block #0 (the genesis block), the new node will have to download hundreds of thousands of blocks to synchronize with the network and re-establish the full blockchain. +((("block chains","creating on nodes")))((("block chains","on new nodes")))((("blocks","on new nodes")))((("full nodes","creating full block chains on")))The first thing a full node will do once it connects to peers is try to construct a complete block chain. If it is a brand-new node and has no block chain at all, it only knows one block, the genesis block, which is statically embedded in the client software. Starting with block #0 (the genesis block), the new node will have to download hundreds of thousands of blocks to synchronize with the network and re-establish the full block chain. -((("syncing the blockchain")))The process of "syncing" the blockchain starts with the +version+ message, because that contains +BestHeight+, a node's current blockchain height (number of blocks). A node will see the +version+ messages from its peers, know how many blocks they each have, and be able to compare to how many blocks it has in its own blockchain. Peered nodes will exchange a%605.420%%% +getblocks+ message that contains the hash (fingerprint) of the top block on their local blockchain. One of the peers will be able to identify the received hash as belonging to a block that is not at the top, but rather belongs to an older block, thus deducing that its own local blockchain is longer than its peer's. +((("syncing the block chain")))The process of syncing the block chain starts with the +version+ message, because that contains +BestHeight+, a node's current block chain height (number of blocks). A node will see the +version+ messages from its peers, know how many blocks they each have, and be able to compare to how many blocks it has in its own block chain. Peered nodes will exchange a%605.420%%% +getblocks+ message that contains the hash (fingerprint) of the top block on their local block chain. One of the peers will be able to identify the received hash as belonging to a block that is not at the top, but rather belongs to an older block, thus deducing that its own local block chain is longer than its peer's. -The peer that has the longer blockchain has more blocks than the other node and can identify which blocks the other node needs in order to "catch up." It will identify the first 500 blocks to share and transmit their hashes using an((("inv messages"))) +inv+ (inventory) message. The node missing these blocks will then retrieve them, by issuing a series of +getdata+ messages requesting the full block data and identifying the requested blocks using the hashes from the +inv+ message. +The peer that has the longer block chain has more blocks than the other node and can identify which blocks the other node needs in order to "catch up." It will identify the first 500 blocks to share and transmit their hashes using an((("inv messages"))) +inv+ (inventory) message. The node missing these blocks will then retrieve them, by issuing a series of +getdata+ messages requesting the full block data and identifying the requested blocks using the hashes from the +inv+ message. -Let's assume, for example, that a node only has the genesis block. It will then receive an +inv+ message from its peers containing the hashes of the next 500 blocks in the chain. It will start requesting blocks from all of its connected peers, spreading the load and ensuring that it doesn't overwhelm any peer with requests. The node keeps track of how many blocks are "in transit" per peer connection, meaning blocks that it has requested but not received, checking that it does not exceed a limit((("MAX_BLOCKS_IN_TRANSIT_PER_PEER constant"))) (+MAX_BLOCKS_IN_TRANSIT_PER_PEER+). This way, if it needs a lot of blocks, it will only request new ones as previous requests are fulfilled, allowing the peers to control the pace of updates and not overwhelming the network. As each block is received, it is added to the blockchain, as we will see in <>. As the local blockchain is gradually built up, more blocks are requested and received, and the process continues until the node catches up to the rest of the network. +Let's assume, for example, that a node only has the genesis block. It will then receive an +inv+ message from its peers containing the hashes of the next 500 blocks in the chain. It will start requesting blocks from all of its connected peers, spreading the load and ensuring that it doesn't overwhelm any peer with requests. The node keeps track of how many blocks are "in transit" per peer connection, meaning blocks that it has requested but not received, checking that it does not exceed a limit((("MAX_BLOCKS_IN_TRANSIT_PER_PEER constant"))) (+MAX_BLOCKS_IN_TRANSIT_PER_PEER+). This way, if it needs a lot of blocks, it will only request new ones as previous requests are fulfilled, allowing the peers to control the pace of updates and not overwhelming the network. As each block is received, it is added to the block chain, as we will see in <>. As the local block chain is gradually built up, more blocks are requested and received, and the process continues until the node catches up to the rest of the network. -This process of comparing the local blockchain with the peers and retrieving any missing blocks happens any time a node goes offline for any period of time. Whether a node has been offline for a few minutes and is missing a few blocks, or a month and is missing a few thousand blocks, it starts by sending +getblocks+, gets an +inv+ response, and starts downloading the missing blocks. <> shows the inventory and block propagation protocol. +This process of comparing the local block chain with the peers and retrieving any missing blocks happens any time a node goes offline for any period of time. Whether a node has been offline for a few minutes and is missing a few blocks, or a month and is missing a few thousand blocks, it starts by sending +getblocks+, gets an +inv+ response, and starts downloading the missing blocks. <> shows the inventory and block propagation protocol. [[inventory_synchronization]] -.Node synchronizing the blockchain by retrieving blocks from a peer +.Node synchronizing the block chain by retrieving blocks from a peer image::images/msbt_0606.png["InventorySynchronization"] [[spv_nodes]] === Simplified Payment Verification (SPV) Nodes -((("nodes","SPV nodes", id="ix_ch06-asciidoc5", range="startofrange")))((("Simplified Payment Verification (SPV) nodes", id="ix_ch06-asciidoc6", range="startofrange")))Not all nodes have the ability to store the full blockchain. Many bitcoin clients are designed to run on space- and power-constrained devices, such as smartphones, tablets, or embedded systems. For such devices, a _simplified payment verification_ (SPV) method is used to allow them to operate without storing the full blockchain. These types of clients are called SPV clients or lightweight clients. As bitcoin adoption surges, the SPV node is becoming the most common form of bitcoin node, especially for bitcoin wallets. +((("nodes","SPV nodes", id="ix_ch06-asciidoc5", range="startofrange")))((("simplified payment verification (SPV) nodes", id="ix_ch06-asciidoc6", range="startofrange")))Not all nodes have the ability to store the full block chain. Many bitcoin clients are designed to run on space- and power-constrained devices, such as smartphones, tablets, or embedded systems. For such devices, a _simplified payment verification_ (SPV) method is used to allow them to operate without storing the full block chain. These types of clients are called SPV clients or lightweight clients. As bitcoin adoption surges, the SPV node is becoming the most common form of bitcoin node, especially for bitcoin wallets. -((("blockchains","on SPV nodes")))SPV nodes download only the block headers and do not download the transactions included in each block. The resulting chain of blocks, without transactions, is 1,000 times smaller than the full blockchain. SPV nodes cannot construct a full picture of all the UTXOs that are available for spending because they do not know about all the transactions on the network. SPV nodes verify transactions using a slightly different methodology that relies on peers to provide partial views of relevant parts of the blockchain on-demand. +((("block chains","on SPV nodes")))SPV nodes download only the block headers and do not download the transactions included in each block. The resulting chain of blocks, without transactions, is 1,000 times smaller than the full block chain. SPV nodes cannot construct a full picture of all the UTXOs that are available for spending because they do not know about all the transactions on the network. SPV nodes verify transactions using a slightly different methodology that relies on peers to provide partial views of relevant parts of the block chain on demand. As an analogy, a full node is like a tourist in a strange city, equipped with a detailed map of every street and every address. By comparison, an SPV node is like a tourist in a strange city asking random strangers for turn-by-turn directions while knowing only one main avenue. Although both tourists can verify the existence of a street by visiting it, the tourist without a map doesn't know what lies down any of the side streets and doesn't know what other streets exist. Positioned in front of 23 Church Street, the tourist without a map cannot know if there are a dozen other "23 Church Street" addresses in the city and whether this is the right one. The mapless tourist's best chance is to ask enough people and hope some of them are not trying to mug him. -Simplified payment verification verifies transactions by reference to their _depth_ in the blockchain instead of their _height_. Whereas a full-blockchain node will construct a fully verified chain of thousands of blocks and transactions reaching down the blockchain (back in time) all the way to the genesis block, an SPV node will verify the chain of all blocks and link that chain to the transaction of interest. +Simplified payment verification verifies transactions by reference to their _depth_ in the block chain instead of their _height_. Whereas a full block chain node will construct a fully verified chain of thousands of blocks and transactions reaching down the block chain (back in time) all the way to the genesis block, an SPV node will verify the chain of all blocks and link that chain to the transaction of interest. For example, when examining a transaction in block 300,000, a full node links all 300,000 blocks down to the genesis block and builds a full database of UTXO, establishing the validity of the transaction by confirming that the UTXO remains unspent. An SPV node cannot validate whether the UTXO is unspent. Instead, the SPV node will establish a link between the transaction and the block that contains it, using a((("merkle trees","SPV and"))) _merkle path_ (see <>). Then, the SPV node waits until it sees the six blocks 300,001 through 300,006 piled on top of the block containing the transaction and verifies it by establishing its depth under blocks 300,006 to 300,001. The fact that other nodes on the network accepted block 300,000 and then did the necessary work to produce six more blocks on top of it is proof, by proxy, that the transaction was not a double-spend. -An SPV node cannot be persuaded that a transaction exists in a block, when it does not in fact exist. The SPV node establishes the existence of a transaction in a block by requesting a merkle path proof and by validating the Proof-Of-Work in the chain of blocks. However, a transaction's existence can be "hidden" from an SPV node. An SPV node can definitely prove that a transaction exists but cannot verify that a transaction, such as a double-spend of the same UTXO, doesn't exist because it doesn't have a record of all transactions. This type of attack can be used as a Denial-of-Service attack or as a double-spending attack against SPV nodes. To defend against this, an SPV node needs to connect randomly to several nodes, to increase the probability that it is in contact with at least one honest node. SPV nodes are therefore vulnerable to network partitioning attacks or Sybil attacks, where they are connected to fake nodes or fake networks and do not have access to honest nodes or the real bitcoin network. +An SPV node cannot be persuaded that a transaction exists in a block when the transaction does not in fact exist. The SPV node establishes the existence of a transaction in a block by requesting a merkle path proof and by validating the Proof-Of-Work in the chain of blocks. However, a transaction's existence can be "hidden" from an SPV node. An SPV node can definitely prove that a transaction exists but cannot verify that a transaction, such as a double-spend of the same UTXO, doesn't exist because it doesn't have a record of all transactions. This type of attack can be used as a Denial-of-Service attack or as a double-spending attack against SPV nodes. To defend against this, an SPV node needs to connect randomly to several nodes, to increase the probability that it is in contact with at least one honest node. SPV nodes are therefore vulnerable to network partitioning attacks or Sybil attacks, where they are connected to fake nodes or fake networks and do not have access to honest nodes or the real bitcoin network. For most practical purposes, well-connected SPV nodes are secure enough, striking the right balance between resource needs, practicality, and security. For the truly security conscious, however, nothing beats running a full-blockchain node.