DEV Community

Cover image for Hello Hadoop | Learn Hadoop in just a few minutes easily!

Hello Hadoop | Learn Hadoop in just a few minutes easily!

Jaya chandrika reddy on June 05, 2020

Originally written at: https://jayachandrika.com Lets explore Hadoop(Click to skip to that topic): Hadoop: Whats the buzz around it? Hadoop EcoS...
Collapse
 
siy profile image
Info Comment hidden by post author - thread only accessible via permalink
Sergiy Yevtushenko

The CAP theorem illustration provided in article is incorrect and misleading. Actually there is no strict "pick two" limitation. Instead there is some balance between all three properties in each particular setup. For example, Cassandra is shown on "AP" side while it can be put on any side by changing replication factor and consistency level configuration parameters.

Collapse
 
chandrika56 profile image
Jaya chandrika reddy

Although there is cap theorem limitation, dbs can be configured to satisfy all three to a certain extent.
Example Cassandra can be configured to satisfy consistency also, like, wait until I get the same result from 3 nodes.

In order to sum it up, it supports all 3 but favours availability over consistency which is what i tried to convey through the article. Thank you for asking and making it more clear to the others☺️

Collapse
 
developertharun profile image
Tharun Shiv

Hey is it necessary to run all the services or can we install specific services that we want?

Collapse
 
chandrika56 profile image
Jaya chandrika reddy • Edited

No need to include all

But there are dependencies between services. Like, if you include pig, then you've to include yarn and map reduce.

Collapse
 
developertharun profile image
Tharun Shiv

Yeah, thought so, okay thank you.

Collapse
 
patarapolw profile image
Pacharapol Withayasakpunt

To cut short, when will I need to use Hadoop, or is it more enterprise thingy?

Collapse
 
chandrika56 profile image
Jaya chandrika reddy

Good question, It is used at startups too. Wherever there is a need to handle big data, Hadoop will be used.
So if you have an app or website that deals with huge amount of data, you will need hadoop. Do comment if you have more doubts 😄

Collapse
 
patarapolw profile image
Pacharapol Withayasakpunt
  • Which typical situations are collecting as Big Data with many V's better than Structured Data?
  • Shared Hadoop service? Self-hosting? Alternatives?
Thread Thread
 
chandrika56 profile image
Jaya chandrika reddy

Big data includes both structured and unstructured data

It is the volume, variety, velocity, value and veracity that decides whether it's big data or not.

I've seen people use HDP, Hortonworks data platform to host their Hadoop cluster.

It is a self hosted cluster. You can set it up on any cloud service provider, like gcp, aws or azure.

I haven't come across shared hosting, and let me know if you have used it.

Hope others also can weigh in and share their views and perspectives on this topic😄

Thread Thread
 
patarapolw profile image
Pacharapol Withayasakpunt

Thanks for be very informative (about HDP).

I am starting to feel like it might be meant for BI / Analytics / AI / ML. For getting employed, it might be good, but for small business, it might be better to rely on outsource or softwares.

Also, I came across Cassandra vs HBase vs MongoDB. It seems like HBase / Hadoop ecosystem might be one of the best.

Thread Thread
 
chandrika56 profile image
Jaya chandrika reddy

Based on cost of software vs storage needs we need to see the tradeoff and decide where Hadoop can be employed or not.

Hbase is part of Hadoop as displayed in the hadoop ecosystem image placed in the article. Where as Cassandra and MongoDB are External Storages.

Based on CAP theorem, we need to decide which Storage to use. Thanks for sharing your interesting perspective 😄

Collapse
 
venkat121998 profile image
venkat anirudh

Good one.. Learned a lot! Keep going!

Collapse
 
chandrika56 profile image
Jaya chandrika reddy

Thanks for the support 😄

Collapse
 
praveenreddy1798 profile image
praveenreddy1798

Well written

Collapse
 
chandrika56 profile image
Jaya chandrika reddy

Thank you ☺️

Collapse
 
developertharun profile image
Tharun Shiv

Amazing Article! A huge topic put together in a single post. Well structured, well written. Kudos 🎉

Collapse
 
chandrika56 profile image
Jaya chandrika reddy

Thank you 🙂

Collapse
 
3ankur profile image
Ankur V

Nice Article 👍👍

Collapse
 
chandrika56 profile image
Jaya chandrika reddy

Thank you 😄

Collapse
 
chandrika56 profile image
Jaya chandrika reddy

Thank you for the wonderful comment and appreciation for the post, glad you enjoyed the article☺️ @saloni509

Collapse
 
venkat121998 profile image
venkat anirudh

Which is the best database to use with Hadoop?

Some comments have been hidden by the post's author - find out more