trezor-firmware/docs/common/ethereum-definitions.md

# Ethereum definitions

Ethereum definitions for networks (chains) and tokens are dynamically generated and
encoded into binary blobs. These blobs could be send as a part of some messages
to the device.

## Built-in definitions

In addition to generated binary blobs, small subset of the definitions is also hardcoded
in firmware in a decoded form (not as a binary blob).
Location of these definitions is for:
* networks - [`networks.json`](https://github.com/trezor/trezor-firmware/blob/master/common/defs/ethereum/networks.json)
* tokens - [`tokens.json`](https://github.com/trezor/trezor-firmware/blob/master/common/defs/ethereum/tokens.json)

Sources for built-in definitions (namely files `../../common/defs/ethereum/networks.json`,
`../../common/defs/ethereum/tokens.json`) are written by hand and are not subject
to any generation described below.

## External definitions

Generated binary blobs are first saved locally (by the script that generates them)
and then published online. By "external" is meant all the definitions for which the binary
blobs are generated - so basically all definitions, because blobs are generated
also for built-in definitions.

Generated blobs are saved as a zip file. To save some space, zipped definitions
does not contain the Merkle tree "proof part" (proof length and the proof itself -
see [definition binary format](communication/ethereum-definitions-binary-format.md)
for more details). Merkle tree proof is the most repetitive part of the definitions
and it can be easily computed when needed. Using this script
[`python/tools/eth_defs_unpack.py`](https://github.com/trezor/trezor-firmware/blob/master/python/tools/eth_defs_unpack.py)
the zip with definitions will be unpacked and all the definitions will be completed.

#### Zip/directory file structure
Saved definitions (binary blobs) are stored in zip file with specific file
structure (same as the structure of completed unpacked definitions):
````
definitions-latest/
├── by_chain_id/
│   ├── "CHAIN_ID"/
│   │   ├── network.dat
│   │   ├── token_"TOKEN_ADDRESS".dat
│   │   ├── token_"TOKEN_ADDRESS".dat
│   │   ...
│   ├── "CHAIN_ID"/
│   │   ├── network.dat
│   │   ├── token_"TOKEN_ADDRESS".dat
│   │   ├── token_"TOKEN_ADDRESS".dat
│   │   ...
│   ...
└── by_slip44/
    ├── "SLIP44_ID"/
    │   └── network.dat
    ├── "SLIP44_ID"/
    │   └── network.dat
    ...
````
where:
* `CHAIN_ID` is a corresponding chain ID of included network/tokens
* `SLIP44_ID` is a corresponding SLIP44 ID of included network
* `TOKEN_ADDRESS` is a lowercase token address (stripped of `0x` prefix)

Notice that token definitions are only accessible by `CHAIN_ID` and `TOKEN_ADDRESS`
(directory `by_chain_id`), not by `SLIP44_ID` (directory `by_slip44`).

#### Definitions online

Generated binary definitions are available online at [publicly accessible website](https://data.trezor.io/eth_definitions) # TODO: update url.

To get the desired definition (one at a time), URL can be composed in multiple ways
and the structure is the same as it is described in the [Zip/directory file structure](#zipdirectory-file-structure)
section. Base URL format is `https://data.trezor.io/eth_definitions/LOOKUP_TYPE/ID/NAME`
where:
* `LOOKUP_TYPE` is one of `by_chain_id` or `by_slip44`
* `ID` is either chain ID or SLIP44 ID (depends on the chosen lookup type)
* `NAME` is either:
  *  `network.dat` for network definition at given chain ID or SLIP44 ID or
  *  `token_"TOKEN_ADDRESS".dat` for token definition at given chain ID and token address
(see [Zip/directory file structure](#zipdirectory-file-structure) section on how to format the token address)

Definitions could be also downloaded by one request in a ZIP file at https://data.trezor.io/eth_definitions/definitions.zip # TODO: update url.

#### `Trezorctl`

Automatic manipulation with the definitions is implemented in `trezorctl` tool.
All Ethereum commands that do work with definitions contains contains options
to automatically download online definition(s) or use locally stored definition(s).
For more info look at the `trezorctl` [documentation]
(https://github.com/trezor/trezor-firmware/blob/master/python/docs/README.rst).

## Process of generating the definitions

Binary definitions are generated by one script -
[`ethereum_definitions.py`](https://github.com/trezor/trezor-firmware/blob/master/common/tools/ethereum_definitions.py)

The process is composed of multiple stages described in the following sections.

### 1. Prepare the definitions to JSON file

Preparation stage starts with collection of all data from all [data sources](#data-sources)
(or from locally stored cache) and creation of connections between the data (finding
the same IDs, etc.). In case that no cache was found at the beginning the cache file
is saved (by default `definitions-cache.json`).

Every definition is checked for the size limitations and user can decide (if he choosed
the interactive mode) what should happen if the size of some field is bigger than
the allowed maximum. This is needed due to size restrictions on Model 1 (buffers with fixed
size).

Subsequently all the definition are checked for duplicates and user has to decide
which ones will be removed. No duplicates are allowed in further processing.

If there are already locally stored definitions in JSON file the comparison takes place. Simple
comparison algorithm looks for these types of changes:
* moved definition - definition was moved to other chain ID or address (in case of tokens)
* modified definition - definition has same chain ID or address but other fields changed
* deleted definition - definition was deleted
* resurrected definition - previously deleted definition is available again
* changes in symbol - symbol has changed in the definition

Results of the comparison are printed out.

After solving all the collisions, size limitations and changes we have "clean" data
saved in a local file (by default `definitions-latest.json`).

### 2. Generate and sign the definitions

Input for this stage is the JSON file from previous stage with definitions.
For the purpose of [verifying the definitions on FW side](#verification-and-validation-on-fw-side)
we use Merkle Tree data structure, which gives us the option to effectively split
this stage into three sub-stages:
1. Getting the Merkle tree root hash - Merkle tree is build from all the definitions
and the root hash is computed.
2. Signing the hash - computed Merkle tree hash is signed using private key.
3. Generating the binary definitions - Merkle tree is build from all the definitions
and the root hash is computed again. Then this hash is verified against signed hash
from previous step using public key to ensure that nothing has changed. If everything
is ok the binary encoded (see
[definition binary format](communication/ethereum-definitions-binary-format.md) document)
definitions are generated to zip file with specific file structure
(see [Zip/directory file structure](#zipdirectory-file-structure) section).

### 3. Publish the definitions

Binary definitions are published to our website. See section [definitions online](#definitions-online)
for more information.

## Data sources

External Ethereum definitions are generated based on data from external APIs and repositories:
* [`coingecko.com`](https://www.coingecko.com/) for most of the info about networks and tokens
* [defillama](https://defillama.com/) to pair as much networks as we can to CoinGecko ID
* [Ethereum Lists - chains](https://github.com/ethereum-lists/chains) as the only source of EVM-based networks
* [Ethereum Lists - tokens](https://github.com/ethereum-lists/tokens) as another source of tokens

## Validation and verification on FW side

To ensure that the definitions send to device are genuine we have to check them
on receiving.

#### Validation

First thing that happens when binary definition is received is validation.Validation
is a single process when FW compares hardcoded values against values found in received
definition.

Based on the [definition binary format](communication/ethereum-definitions-binary-format.md)
FW checks the following values:
* format version - format version found in definiton has to be equal or higher than
the one specified in FW
* type of data - type of data found in definiton has to be the same as FW expects
* data version - data version found in definiton has to be equal or higher than
the one specified in FW

If any of these checks fail the definition is rejected.

#### Verification

Verification of received definitions is made using Merkle tree data structure
in combination with a signed Merkle tree root hash. Every encoded definition
is packed with list of Merkle tree proofs and the signed Merkle tree root hash
(see [definition binary format](communication/ethereum-definitions-binary-format.md)
section).

When the definition is received, hash from the combination of the encoded definition
itself and the list of proofs is computed. This hash is then verified against signed
hash included in received definition using the public key hardcoded in FW. This last
step ensures that nothing has changed in the definition.