DEV Community

Discussion on: Explain MapReduce like I'm five

Collapse
 
ben profile image
Ben Halpern

Can you go more into the AT SCALE part?

Collapse
 
tbodt profile image
tbodt

Map at scale: you cut up the data into pieces, map each piece on a different computer, and then send it to the reducers.
Reduce at scale: you take the data from the map part, reduce each piece on a different computer, and then you're done.

Thread Thread
 
ben profile image
Ben Halpern

What was keeping MapReduce from being a thing before it became a thing, like is this something people were aware of as an approach but computing/networking wasn't there yet, or was this an approach people hadn't thought to take beforehand? Or are there underlying implementation details that allow the simple part you're describing to be that simple?

Thread Thread
 
rhymes profile image
rhymes • Edited

I think it became a thing when massive scalability was required.

The idea of dividing computation in smaller parts and then combining it to return a result is as old as modern programming I think. Sorting (quicksort for example) is an example of a divide and conquer algorithm. Generic tail recursion is another example.

I would say MapReduce is a specialization of a divide and conquer type of algorithm but I could be wrong about this.

The primitives "map" and "reduce" are part of functional computer languages since way before MapReduce was created so I think people were aware they could something do like that but could not do so at scale.

MapReduce IIRC was implemented at large by Google in their efforts to index the WWW. Here you can browse the slides describing the algorithm: research.google.com/archive/mapred...

Note that they used a distributed file system (GFS) to share data.

Thread Thread
 
rhymes profile image
rhymes

sorry I forgot the 5 year old part :-D

Thread Thread
 
ben profile image
Ben Halpern

Thread Thread
 
nickshanks profile image
Nicholas Shanks

Not so Genius.jpg : c ≠ a² + b² (right-angled triangle, left hand side of board)