ETL
Ethereum dataset - https://cloud.google.com/blog/products/data-analytics/ethereum-bigquery-how-we-built-dataset
Primary data structures - blocks, transactions - as well as high-value data derivatives - token transfers, smart contract method descriptions.
Architecture
The ingestion pipeline phases - export and load.
Stateless workflow
The workflow starts with export blocks and transactions task where a command from Ethereum ETL name get block range for date.py is invoked.
python get_block_range_for_date.py --date 2018-01-01
# gives block range for given date
> 4832686, 4838611
This command uses JSON RPC API to probe the first and last blocks, then narrows the bounds recursively using linear interpolation until the required blocks are found.
Last updated
Was this helpful?