Originally written at: https://jayachandrika.com
Lets explore Hadoop(Click to skip to that topic):
Hadoop: Whats the buzz around it?
Hadoop EcoS...
Some comments have been hidden by the post's author - find out more
For further actions, you may consider blocking this person and/or reporting abuse
The CAP theorem illustration provided in article is incorrect and misleading. Actually there is no strict "pick two" limitation. Instead there is some balance between all three properties in each particular setup. For example, Cassandra is shown on "AP" side while it can be put on any side by changing replication factor and consistency level configuration parameters.
Although there is cap theorem limitation, dbs can be configured to satisfy all three to a certain extent.
Example Cassandra can be configured to satisfy consistency also, like, wait until I get the same result from 3 nodes.
In order to sum it up, it supports all 3 but favours availability over consistency which is what i tried to convey through the article. Thank you for asking and making it more clear to the others☺️
Hey is it necessary to run all the services or can we install specific services that we want?
No need to include all
But there are dependencies between services. Like, if you include pig, then you've to include yarn and map reduce.
Yeah, thought so, okay thank you.
To cut short, when will I need to use Hadoop, or is it more enterprise thingy?
Good question, It is used at startups too. Wherever there is a need to handle big data, Hadoop will be used.
So if you have an app or website that deals with huge amount of data, you will need hadoop. Do comment if you have more doubts 😄
Big data includes both structured and unstructured data
It is the volume, variety, velocity, value and veracity that decides whether it's big data or not.
I've seen people use HDP, Hortonworks data platform to host their Hadoop cluster.
It is a self hosted cluster. You can set it up on any cloud service provider, like gcp, aws or azure.
I haven't come across shared hosting, and let me know if you have used it.
Hope others also can weigh in and share their views and perspectives on this topic😄
Thanks for be very informative (about HDP).
I am starting to feel like it might be meant for BI / Analytics / AI / ML. For getting employed, it might be good, but for small business, it might be better to rely on outsource or softwares.
Also, I came across Cassandra vs HBase vs MongoDB. It seems like HBase / Hadoop ecosystem might be one of the best.
Based on cost of software vs storage needs we need to see the tradeoff and decide where Hadoop can be employed or not.
Hbase is part of Hadoop as displayed in the hadoop ecosystem image placed in the article. Where as Cassandra and MongoDB are External Storages.
Based on CAP theorem, we need to decide which Storage to use. Thanks for sharing your interesting perspective 😄
Good one.. Learned a lot! Keep going!
Thanks for the support 😄
Well written
Thank you ☺️
Amazing Article! A huge topic put together in a single post. Well structured, well written. Kudos 🎉
Thank you 🙂
Nice Article 👍👍
Thank you 😄
Thank you for the wonderful comment and appreciation for the post, glad you enjoyed the article☺️ @saloni509
Which is the best database to use with Hadoop?