ETL

Ethereum dataset - https://cloud.google.com/blog/products/data-analytics/ethereum-bigquery-how-we-built-datasetarrow-up-right

Primary data structures - blocks, transactions - as well as high-value data derivatives - token transfers, smart contract method descriptions.

Architecture

The ingestion pipeline phases - export and load.

Stateless workflow

The workflow starts with export blocks and transactions task where a command from Ethereum ETL name get block range for date.py is invoked.

python get_block_range_for_date.py --date 2018-01-01
# gives block range for given date
> 4832686, 4838611

This command uses JSON RPC API to probe the first and last blocks, then narrows the bounds recursively using linear interpolation until the required blocks are found.

Last updated