If you ask me, this is the backbone of big data and the reason  that morons like me need Hadoop.

 

In an effort to explain the concept of Cluster Computing to my mum I found this great analogy:

You

Imagine that you’re the governor of a region in the Roman Empire.  The romans’ love of the census is legendary, so imagine that one day you’re tasked, by Caesar himself, to take a count of all the citizens in your region and report the numbers back to the top.

Your region covers 10 towns, so you command your head census taker to visit each of them, count the citizens and return with the total.  With the travel time, you estimate this will take him approximately 6 months, assuming he isn’t eaten by a lion or murdered along the way.

Caesar wants it in 1 month.

You need a new plan.

Google map of Roman route.

Sending out one census taker can’t work.  Apart from just the time pressure you also run the risk of losing him completely, tragically wiping out all the data that he’s meant to collect.

A nerdling named Antikythera comes up with a solution.

Instead of sending the head census taker to each of the towns, why not send ONE census taker to each of the towns in tandem then add up the numbers when they return.

Google map of ancient route.

Brilliant.  However there is still the risk of one or more  of the 10 new census takers never making it back with their share of the data.

Antikythera suggests sending out 3 census takers to each of the towns, independently, to ensure that at least one of them gets back safely.  If they all make it back you can just average out their results and report the number as the town’s accurate population. If just one or more of them is killed along the way you can still rely on 1 or 2 results to make your objective.

So now you’re sending 30 census takers.  This will ensure a result in 1/10th of the time and with a significant loss of risk.

This is cluster computing, splitting a large problem among several different computers and storing redundant copies far apart from each other.

This is what Hadoop facilitates, and this is what makes it so powerful.

Clooney. You're so good looking.

Wanna keep learning stuff?

Of course you do.  Check out my attempt at explaining MapReduce: