DEV Community

Discussion on: Cloud Data Fusion, a game-changer for GCP

Collapse
 
giulianobr profile image
Giuliano Ribeiro

Hi there!
Thanks to reading my post :)

Answering you, yes it is expansive. The focus of this product is big/giant companies.

But here you can get a tip: go to GCP Marketplace install the CDAP with "Click to Deploy" option.

The opensource and package version available there can do almost all the options that you have on Data Fusion. The best part: the cost is only for the server running the CDAP and for your Dataproc cluster.

Thank you!

Collapse
 
nickguebhard profile image
Nick Guebhard

Hi Giuliano,

thanks for the informative blog post and the tip about GCP marketplace. I've managed to create the server for CDAP but do you have any info about how to provision the Dataproc cluster to include the server running CDAP. It seems that without running the plugin on a Dataproc cluster, the process of authenticating access to BigQuery and other Google Cloud sources is more complicated.

Thanks!

Collapse
 
geojsg profile image
JS Gourdet

Hey Giuliano,
Thanks for this insightful article.
As I was telling, price is prohibiting small & medium company using it just for daily usage unfortunately and who would prefer using 3rd party solution like Segment and others (of course it has less features). So it's pity that GCP could do offer a special package for such audience and use case.
Using CDAP from marketplace is actually a possibility but not serverless.
I was wondering if a trick like saving and exporting the pipeline to swtich off the instance and then daily create an instance import the saved pipeline, execute it and close instance after, could be done ?
So far, I couldn't find a possibility to do it unfortunately.

Keep me informed if you by any chance you do.

Thread Thread
 
belipero profile image
Beliche

Hi!
I'm currently searching for a serverless solution for ETL transformation and I was thinking in GCP Data Flow but pricing is restrictve for us.
Our basic requirements is to read a json file from an API which returns 4000 objects, do data transformation to objects and call an API on destiny for data import.

It's not possible to swith of Data Flow instance as you asked, right?

Regards

Thread Thread
 
geojsg profile image
JS Gourdet

Hi,
DataFlow is really not the tool for such load, it concerns much higher volume.
Probably Google Cloud Function could be an cheap option, depending of your data transformation.

PS: My question was about Google Cloud Data Fusion, which is anyway not appropriate for your use case.

Thread Thread
 
belipero profile image
Beliche

Hi!
Thanks for the reply. Definitly GCP Data Fusion is not the use case for my data integration requirements.
I tried to say Data Fusion instead of Data Flow, sorry for that, I'm reviewing too much tools that I mispelled.

Regards