... the answer is tricky

Data ScienceFinding yourself in a boardroom full of executives pushing these buzz-words around the table like a pack of dung beetles can be confusing.

However, the terminology they recite is real.  And it's an explosive part of the evolution of technology.

Big data was not invented; it was an organic byproduct of business and technology.

Like most great human achievements, it appears to have been unforeseen.

Very smart people at some  great companies (Google, Yahoo, Amazon, etc...) found a problem and began solving it.

Now we're stuck with miles-high amounts of data, scores of questions, problems that CAN be solved, and ... Data Science, which is trying to do it all.

I'm learning this field.  My goal is to document the process.  Mostly for me, but it would be nice to know that I helped someone else too...

HadoopHadoop is not a tool, or a language, or a product… It’s sort of all of those things.  It’s an ecosystem made up of many other tools, languages and products. These bits can be strung together in any number of ways to come up with big data solutions.
Here are just a few examples:

Python... is one language (out of many) that can be used to build applications for big data.  At the moment it's one of the most popular, primarily because it's EASY.  It's also a non-compiling language so it makes experimenting quick and straightforward.

Hive... is an example of a NoSql database query language.  NoSql doesn’t mean anti SQL or even opposite SQL, it’s just a different way of storing and retrieving data, a method made necessary by humongous data collectors like Google and Yahoo.

Spark... is a way to create applications and scripts that make use of big data.  It’s just one way, but for now it’s also seen as the best/fastest way. You can use many languages to run a Spark job, including Python, Java, and the slightly odd Scala.

Wanna keep learning stuff?

Of course you do.  Why not check out my attempt at making sense of Cluster Computing: