Cloud Fusion
Working with Cloud Fusion
Last updated
Was this helpful?
Working with Cloud Fusion
Last updated
Was this helpful?
Roles required
Service Account
Role
Description
service-370266331882@gcp-sa-datafusion.iam.gserviceaccount.com
Cloud Data Fusion API Service Agent
Gives Cloud Data Fusion service account access to Service Networking, Cloud Dataproc, Cloud Storage, BigQuery, Cloud Spanner, and Cloud Bigtable resources.
43018728793-compute@developer.gserviceaccount.com
Compute Engine default service account
Cloud Data Fusion Runner
Cloud Data Fusion API Service Agent Editor
Access to data fusion runtime resources.
Gives Cloud Data Fusion service account access to Service Networking, Cloud Dataproc, Cloud Storage, BigQuery, Cloud Spanner, and Cloud Bigtable resources.
Components:
Source
Transform
Analytics
Sink
Conditions and Actions
Pipeline Import
Runtime arguments
Key
Value
system.profile.name
SYSTEM:dataproc
By populating the Data Fusion operator with a few parameters, we can now deploy, start and stop pipelines.
Order of task flow using bit shift operator
Abstract
Example
Shipment data
Dataset level lineage
Field level lineage
This shows the history of changes a particular field has gone through.
If memory is not sufficient increase the executor memory quota in the pipeline config.
PROVISION task failed in REQUESTING_CREATE state for program run program_run:default.Reusable-Pipeline.-SNAPSHOT.workflow.DataPipelineWorkflow.a48de625-b92a-11eb-b814-66625c4294b9 due to Dataproc operation failure: INVALID_ARGUMENT: Insufficient 'DISKS_TOTAL_GB' quota. Requested 3000.0, available 1096.0.
.
Start a data pipeline -
.