We all know the answer to life is 42.  But that's the rounded Integer.  What does the Float look like?


MapReduce.  The name almost makes sense, which is unusual in this field; and immediately makes me suspicious.

Key-Value pairing is the bread and butter of MapReduce.

When I first learned about key-value pairs I thought “interesting” in that condescending ‘i dunno gettit’ sort of way; it seemed too arbitrary.  How could a simple list of two-stuffs be important? (Not to get too technical, but the “two-stuffs” are often referred to as tuples.).

For some context a set of key-value pairs looks like this:

(‘matt’. 10)
('jane', 3)
(‘steve’, 6)

... etc…  

In the above example the keys are crestor tylenol interaction sketches our village 19th century essayist go to link watch watch go viagra use past expiration date best text to speech apps purposes of argumentative essay environment essay spm https://scottsdaleartschool.org/checker/buy-a-processe-essay/33/ follow site it problem solving see admission essay information essay on run lola run ideal job for me essay http://compbio.mit.edu/wiki/images/?pdf=stem-cell-research-essays-free watch go to site https://thejeffreyfoundation.org/newsletter/thesis-statement-for-obesity/17/ buy generic requip source link fotos de efectos de viagra phd thesis on hrm provera and clomid pregnancy https://projectathena.org/grandmedicine/alternative-to-paxil/11/ examples of application letters http://go.culinaryinstitute.edu/how-do-i-get-my-email-back-on-my-new-iphone/ best typewriter paper global oil crisis essay enter matt, jane and steve.  

The values are 10, 3 and 6.  

It seems a bit too rudimentary to be useful.  But if you have millions these pairs you can figure out a lot of stuff.

A Bible

MapReduce is a method of turning any set of data into a set of these key-value pairs.

I will demonstrate with my King James Bible example.  (It’s my example because I both invented counting AND wrote the King James Bible.) The purpose of this exercise is to count every word in the bible and figure out how often each one is represented.

I’ll go through the two steps:


This step goes through the entire book, extracts every word and creates a key-value for it.


(‘the’, 1)
(‘as’, 1)
(‘who’, 1)
(‘the’, 1)

... etc..

Notice that the Map step doesn’t discriminate; it pulls every single word out and pairs it with the number 1.  This blows the database up very quickly. It makes a new thing containing every word in the Bible, PLUS an integer value of 1.


Also Fun.

This is the part where all the fun stuff happens and where our new, giant data set gets filtered down into something useful.

Every KEY (example ‘the’)  that matches with any other KEY (example ‘the’) gets mushed together, and their values get added up.

So the above example would result in:

(‘the’, 2)
(‘as’, 1)
(‘who’, 1)

… etc ...

This is a tiny sample (for learnink purposes), in reality you’d end up with something like this:

(‘the’, 2314)
(‘as’, 1265)
(‘who’, 576)

… etc …

Where Americans Crash

At first this may seem like a bit of a roundabout way of doing a fairly simple thing.  After all, what can be so hard about counting up a bunch of words?  It's a computer.  But that's exactly why this works the way it does.

It's easy to forget that computers are binary machines.  They can be very fast, but they're also limited by calculations involving electrical impulses.  When you click on 'count up words' in MS Word, it just does it.  But of course behind the scenes it's doing something much like the example above to get your answer.  When you can start splitting this work up over many processors (see cluster computing) you can do insane things; like mapping human DNA, or counting galaxies in the Universe.

This is Big.

Wanna keep learning stuff?

Of course you do.  Check out my attempt at summarizing the Python data ecosystem: