diff --git a/docs/common/communication/bitcoin-signing.md b/docs/common/communication/bitcoin-signing.md new file mode 100644 index 000000000..7386e5516 --- /dev/null +++ b/docs/common/communication/bitcoin-signing.md @@ -0,0 +1,401 @@ +# Bitcoin signing flow + +The Bitcoin signing process is one of the most complicated workflows in the Trezor +codebase. This is because Trezor cannot store arbitrarily large transactions in memory, +so both the input data and the results must be sent in chunks. The Protobuf messages +cannot fully encode the data pertaining to a single transaction; instead, the data +is spread out over multiple messages. + +## Overview + +The signing flow is initiated by a `SignTx` command. The message contains the name of +the coin, number of inputs and outputs, and transaction metadata: version, lock time, +and others that are required for some coins. + +In response, Trezor will send a number of `TxRequest` messages, asking for additional +data from the host. The host is supposed to respond with a `TxAck` providing the +requested data. + +Trezor can request the following kinds of items: + +* input of the transaction being signed +* output of the transaction being signed +* metadata of a _previous transaction_, i.e., the transaction whose UTXO is being spent +* input of a previous transaction +* output of a previous transaction +* additional trailing data of a previous transaction + +As part of each `TxRequest` message, Trezor can also send a chunk of the resulting +serialized transaction, and/or a signature of one of the inputs. + +The flow ends when Trezor sends a `TxRequest` with `request_type` of `TXFINISHED`. + +## Signing phases + +The following content is for reference only, and details might change in the future. +Host code should make no assumptions about the order of the phases. + +The list of phases here also does not necessarily correspond to internal phase +numbering. + +### Gathering info about current transaction + +Trezor will request all inputs and outputs of the transaction to be signed, and set up +data structures that allow it to verify that the same data is sent in the following +phases. + +In this phase, Trezor will also ask the user to confirm destination addresses, +transaction fee, metadata, and the total being sent. If the user confirms, we continue +to the next phase. + +### Validation of input data + +Trezor must verify that the host does not lie about input amounts, i.e., that the +transaction total in the first phase was calculated correctly. + +For this reason, Trezor will ask the host to send each input again. It will then request +data about the referenced previous transaction: metadata, all inputs, all outputs, and +possible trailing data. This allows Trezor to reconstruct the previous transaction and +calculate its hash. If this hash matches the provided one, and the amount on selected +UTXO matches the input amount, the given input is considered valid. + +Trezor also supports pre-signed inputs for multi-party signing. If an input has an +`EXTERNAL` type, and provides a signature, Trezor will validate the signature against +the previous transaction in this step. + +### Signing + +Satisfied that all provided data is valid, Trezor will ask once again about every input +and generate a signature for it. + +For every legacy (non-segwit) inputs, it is necessary to stream the full set of inputs +and outputs again, so that Trezor can build the serialization which is being signed. +For segwit inputs, this is not necessary + +Finally, when all inputs are streamed, Trezor will ask for every output, so that it can +serialize it, fill out change addresses, and return the full transaction. + +## Old versus new data structures + +Originally, the `TxAck` message contained one field of type `TransactionType`. This, in +turn, contained all the possible fields that Trezor could request: + +* `TransactionType` itself contains fields for all necessary metadata, plus a field + `extra_data` for trailing data. +* `TransactionType.inputs` is an array of `TxInputType` objects, each of which can + describe either the current input, or an input of a previous transaction. +* `TransactionType.outputs` is an array of `TxOutputType` objects, each of which can + describe an output of the _current_ transaction. +* `TransactionType.bin_outputs` is an array of `TxOutputBinType` objects, each of which + can describe an output of a _previous_ transaction. + +This organization makes it practical to use the `TransactionType` for host-side storage: +a transaction can be fully stored in one object, and in order to send a `TxAck` +response, you only need to extract the appropriate data. + +The cost of this is that this organization makes it extremely unclear _which data_ +should be extracted at which points. + +To make the constraints more visible, a new set of data types was designed. There is a +`TxAck` message for every `Kind` of data. These only define the fields that are +appropriate for that kind of request. + +It is possible to use both representations, as they are wire-compatible. However, we +recommend using the new definitions for new applications. + +## Request types + +The `TxRequest` message always contains a `request_type` field, indicating which kind of +data it wants. In addition, `request_details` specify the particular piece of data +requested. + +If `request_details.tx_hash` is set, Trezor is requesting data about a specified +previous transaction. If it is unset, Trezor wants data about current transaction. + +### Transaction input + +Trezor sets `request_type` to `TXINPUT`, and `request_details.tx_hash` is unset. + +`request_details.request_index` is the index of the input in the transaction: 0 is the +first input, 1 is second, etc. + +**Old style:** Host must respond with a `TxAck` message. The field `tx.inputs` must be +set to an array of one element, which describes the requested input. All other fields +should be left unset. + +**New style:** Host must respond with a `TxAckInput` message. All relevant data must be +set on `tx.input`. + +#### Normal (internal) inputs + +Usually, the user owns, and wants to sign, all inputs of the transaction. For that, the +host must specify a derivation path for the key, and signature type (legacy, p2sh +segwit, native segwit). + +#### Multisig inputs + +For multisig inputs, the XPUBs of all signers (including the current user) must be +provided in the `multisig` structure. Legacy multisig uses type `SPENDMULTISIG`, P2SH +segwit and native segwit multisig use the same type as non-multisig inputs, i.e. +`SPENDP2SHWITNESS` or `SPENDWITNESS`. + +Full documentation for multisig is TBD. + +### Pre-signed inputs + +Trezor can include inputs that already have a valid signature from some other party. +Such inputs are of type `EXTERNAL`. These must provide `script_sig` and/or `witness` +with the signature. This data is then validated against the previous transaction. + +### External inputs with ownership proofs + +If the other signing party hasn't signed their input yet (i.e., with two Trezors, one +must sign first so that the other can include a pre-signed input), they can instead +provide a SLIP-19 ownership proof in the field `ownership_proof`, with optional +commitment data in `commitment_data`. The input type is also `EXTERNAL` in this case. + +### Transaction output + +Trezor sets `request_type` to `OUTPUT`, and `request_details.tx_hash` is unset. + +`request_details.request_index` is the index of the output in the transaction: 0 is the +first input, 1 is second, etc. + +**Old style:** Host must respond with a `TxAck` message. The field `tx.outputs` must be +set to an array of one element, which describes the requested output. All other fields +should be left unset. + +**New style:** Host must respond with a `TxAckOutput` message. All relevant data must be +set on `tx.output`. + +#### External outputs + +Outputs that send coins to a particular address are always of type `PAYTOADDRESS`. The +address is sent as a string in the field `address`. + +#### Change outputs + +Outputs that send coins back to the same owner must specify a derivation path and the +appropriate script type. If the derivation path has the same prefix as _all_ inputs, and +a matching script type (legacy, p2sh segwit, native segwit), it is considered to be a +change output, and its amount is subtracted from the total. + +`address` must not be specified in this case. It is instead derived internally from the +provided derivation path. + +#### OP_RETURN outputs + +Outputs of type `PAYTOOPRETURN` must not specify `address` nor `address_n`, and the +`amount` must be zero. The OP_RETURN data is sent as `op_return_data` field. + +### Previous transaction metadata + +Trezor sets `request_type` to `TXMETA`. `request_details.tx_hash` is a transaction hash, +matching one of the current transaction inputs. + +**Old style:** Host must respond with a `TxAck` message. The structure `tx` must be +filled out with relevant data, in particular, `inputs_cnt` and `outputs_cnt` must be set +to the number of transaction inputs and outputs. Arrays `inputs`, `outputs`, +`bin_outputs` and `extra_data` should be empty. + +**New style:** Host must respond with a `TxAckPrevMeta` message. All relevant data must +be set on `tx`. + +#### Extra data + +Some coins (e.g., Zcash) contain data at the end of transaction serialization that +Trezor does not understand. The host must indicate the length of this extra data in the +field `extra_data_len`. + +To figure out which is the extra data, the host must parse the serialized previous +transaction up to the last field understood by Trezor. In case of Zcash, that is: + +* version + version group ID +* number of inputs, and every input +* number of outputs, and every output +* lock time +* expiry + +All data after the `expiry` field is considered "extra data". + +### Previous transaction input + +Trezor sets `request_type` to `TXINPUT`. `request_details.tx_hash` is a transaction +hash, matching one of the current transaction inputs. + +**Old style:** Host must respond with a `TxAck` message. The field `tx.inputs` must be +set to an array of one element, which describes the requested input of the specified +previous transaction. All other fields should be left unset. + +**New style:** Host must respond with a `TxAckPrevInput` message. All relevant data must +be set on `tx.input`. + +### Previous transaction output + +Trezor sets `request_type` to `TXOUTPUT`. `request_details.tx_hash` is a transaction +hash, matching one of the current transaction inputs. + +**Old style:** Host must respond with a `TxAck` message. The field `tx.bin_outputs` must +be set to an array of one element, which describes the requested output of the specified +previous transaction. All other fields should be left unset. + +**New style:** Host must respond with a `TxAckPrevOutput` message. All relevant data +must be set on `tx.output`. + +### Previous transaction trailing data + +On some coins, such as Zcash, the transaction serialization can contain data not +understood by Trezor. This data is not relevant for validation, but it must be included +so that Trezor can correctly compute the previous transaction hash. + +Trezor sets `request_type` to `TXEXTRADATA`. `request_details.tx_hash` is a transaction +hash, matching one of the current transaction inputs. + +`request_details.extra_data_offset` specifies the offset of the requested data from the +_start_ of the extra data. `request_details.extra_data_length` specifies the length of +the requested chunk. + +**Old style:** Host must respond with a `TxAck` message. The field `tx.extra_data` must +contain the specified chunk, starting at the given offset and of exactly the given +length. All other fields should be unset. + +**New style:** Host must respond with a `TxAckPrevExtraData` message. The chunk must be +set to `tx.extra_data_chunk`. + +## Implementation notes + +### Pseudo-code + +The following is a rough outline of host-side implementation. See above for detailed +info. + +```python +transaction_bytes = "" +signatures = [""] * len(INPUTS) + +def sign_tx(): + # send initial message + send_message( + SignTx( + coin_name, + inputs_count=len(INPUTS), + outputs_count=len(OUTPUTS), + # ...fill individual metadata fields + ) + ) + + # wait for TxAck forever, until Trezor indicates we are finished + while True: + msg = receive_message() + + # extract data first + extract_streamed_data(msg.serialized) + + if msg.request_type == TXFINISHED: + # we are done + break + + if msg.details.tx_hash is not None: + # Trezor requires data about some previous transaction + send_response_prev(msg.request_type, msg.details) + else: + # Trezor requires data about this transaction + send_response_current(msg.request_type, msg.details) + +def extract_streamed_data(ser: TxRequestSerializedType): + global transaction_bytes, signatures + # append serialized data to what we got so far + transaction_bytes += ser.serialized_tx + if ser.signature_index is not None: + # read the signature + signatures[ser.signature_index] = ser.signature + +def send_response_prev(request_type: RequestType, details: TxRequestDetailsType): + prev_tx = get_prev_tx(details.tx_hash) + if request_type == TXINPUT: + send_prev_input(prev_tx.inputs[details.request_index]) + elif request_type == TXOUTPUT: + send_prev_output(prev_tx.outputs[details.request_index]) + elif request_type == TXMETA: + send_prev_metadata(prev_tx) + elif request_type == TXEXTRADATA: + offset = details.extra_data_offset + length = details.extra_data_length + extra_data_chunk = prev_tx.extra_data[offset : offset + length] + send_prev_extra_data(extra_data_chunk) + +def send_response_current(request_type: RequestType, details: TxRequestDetailsType): + if request_type == TXINPUT: + send_input(INPUTS[details.request_index]) + elif request_type == TXOUTPUT: + send_output(OUTPUTS[details.request_index]) +``` + +### Wire compatibility + +The new definitions are structured so that the Protobuf binary encoded form can be +decoded into both representations. This means that the host can encode data in the old +representation, and Trezor will successfully and correctly decode it into the new one. + +This is done by reusing field IDs as appropriate, and taking advantage of the fact that +Protobuf encodes arrays as a sequence of the same field repeated a number of times. + +For example, here is a part of the `TxAck` definition: + +```protobuf +message TxAck { + optional TransactionType tx = 1; + + message TransactionType { + // ... some fields omitted ... + repeated TxInputType inputs = 2; + // ... some fields omitted ... + + message TxInputType { + repeated uint32 address_n = 1; + // ... some fields omitted ... + optional uint64 amount = 8; + // ... some fields omitted ... + } + } +} +``` + +A message carrying these fields would look like this: + +``` +FIELD 1 (type NESTED): + FIELD 2 (type NESTED): + FIELD 1 (type int): 0x8000002c + FIELD 1 (type int): 0x80000000 + FIELD 1 (type int): 0x80000000 + FIELD 1 (type int): 0 + FIELD 1 (type int): 0 + FIELD 8 (type int): 1234567 +``` + +We can see that this is identical as if the type definition looked as follows; indeed, +we only renamed the types, removed some fields, and set some `optional` or `repeated` +to `required` instead. + +```protobuf +message TxAckInput { + required TxAckInputWrapper tx = 1; + + message TxAckInputWrapper { + required TxInput input = 2; // the field is now required instead of repeated + + message TxInput { + repeated uint32 address_n = 1; + required uint64 amount = 8; + } + } +} +``` + +A caveat of this approach is that this introduces invisible dependencies: `TxInput` and +`PrevInput` fold into the same old-style `TxInputType`, so adding new fields must be +done carefully. + +We expect to gradually deprecate the `TransactionType`. At that point, the new-style +types will be fully independent.