ETL

Ethereum dataset - https://cloud.google.com/blog/products/data-analytics/ethereum-bigquery-how-we-built-dataset

Primary data structures - blocks, transactions - as well as high-value data derivatives - token transfers, smart contract method descriptions.

Architecture

The ingestion pipeline phases - export and load.

Stateless workflow

The workflow starts with export blocks and transactions task where a command from Ethereum ETL name get block range for date.py is invoked.

python get_block_range_for_date.py --date 2018-01-01
# gives block range for given date
> 4832686, 4838611

This command uses JSON RPC API to probe the first and last blocks, then narrows the bounds recursively using linear interpolation until the required blocks are found.

Last updated

Was this helpful?