It's a great tool but very expensive if we want to create few pipelines once and let them run daily.
Is there any possibility to reduce pricing ? As I understand the Fusion instance must be run 24/7 to be able to execute the scheduled pipeline on daily basis.
Answering you, yes it is expansive. The focus of this product is big/giant companies.
But here you can get a tip: go to GCP Marketplace install the CDAP with "Click to Deploy" option.
The opensource and package version available there can do almost all the options that you have on Data Fusion. The best part: the cost is only for the server running the CDAP and for your Dataproc cluster.
thanks for the informative blog post and the tip about GCP marketplace. I've managed to create the server for CDAP but do you have any info about how to provision the Dataproc cluster to include the server running CDAP. It seems that without running the plugin on a Dataproc cluster, the process of authenticating access to BigQuery and other Google Cloud sources is more complicated.
Hey Giuliano,
Thanks for this insightful article.
As I was telling, price is prohibiting small & medium company using it just for daily usage unfortunately and who would prefer using 3rd party solution like Segment and others (of course it has less features). So it's pity that GCP could do offer a special package for such audience and use case.
Using CDAP from marketplace is actually a possibility but not serverless.
I was wondering if a trick like saving and exporting the pipeline to swtich off the instance and then daily create an instance import the saved pipeline, execute it and close instance after, could be done ?
So far, I couldn't find a possibility to do it unfortunately.
Hi!
I'm currently searching for a serverless solution for ETL transformation and I was thinking in GCP Data Flow but pricing is restrictve for us.
Our basic requirements is to read a json file from an API which returns 4000 objects, do data transformation to objects and call an API on destiny for data import.
It's not possible to swith of Data Flow instance as you asked, right?
Hi,
DataFlow is really not the tool for such load, it concerns much higher volume.
Probably Google Cloud Function could be an cheap option, depending of your data transformation.
PS: My question was about Google Cloud Data Fusion, which is anyway not appropriate for your use case.
Hi!
Thanks for the reply. Definitly GCP Data Fusion is not the use case for my data integration requirements.
I tried to say Data Fusion instead of Data Flow, sorry for that, I'm reviewing too much tools that I mispelled.
Regards
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
It's a great tool but very expensive if we want to create few pipelines once and let them run daily.
Is there any possibility to reduce pricing ? As I understand the Fusion instance must be run 24/7 to be able to execute the scheduled pipeline on daily basis.
Hi there!
Thanks to reading my post :)
Answering you, yes it is expansive. The focus of this product is big/giant companies.
But here you can get a tip: go to GCP Marketplace install the CDAP with "Click to Deploy" option.
The opensource and package version available there can do almost all the options that you have on Data Fusion. The best part: the cost is only for the server running the CDAP and for your Dataproc cluster.
Thank you!
Hi Giuliano,
thanks for the informative blog post and the tip about GCP marketplace. I've managed to create the server for CDAP but do you have any info about how to provision the Dataproc cluster to include the server running CDAP. It seems that without running the plugin on a Dataproc cluster, the process of authenticating access to BigQuery and other Google Cloud sources is more complicated.
Thanks!
Hey Giuliano,
Thanks for this insightful article.
As I was telling, price is prohibiting small & medium company using it just for daily usage unfortunately and who would prefer using 3rd party solution like Segment and others (of course it has less features). So it's pity that GCP could do offer a special package for such audience and use case.
Using CDAP from marketplace is actually a possibility but not serverless.
I was wondering if a trick like saving and exporting the pipeline to swtich off the instance and then daily create an instance import the saved pipeline, execute it and close instance after, could be done ?
So far, I couldn't find a possibility to do it unfortunately.
Keep me informed if you by any chance you do.
Hi!
I'm currently searching for a serverless solution for ETL transformation and I was thinking in GCP Data Flow but pricing is restrictve for us.
Our basic requirements is to read a json file from an API which returns 4000 objects, do data transformation to objects and call an API on destiny for data import.
It's not possible to swith of Data Flow instance as you asked, right?
Regards
Hi,
DataFlow is really not the tool for such load, it concerns much higher volume.
Probably Google Cloud Function could be an cheap option, depending of your data transformation.
PS: My question was about Google Cloud Data Fusion, which is anyway not appropriate for your use case.
Hi!
Thanks for the reply. Definitly GCP Data Fusion is not the use case for my data integration requirements.
I tried to say Data Fusion instead of Data Flow, sorry for that, I'm reviewing too much tools that I mispelled.
Regards