Skip to content

Data Science

with Matt Hughes
  • Home
  • Learn Stuff
    • Cluster Computing
    • MapReduce
    • Python Data
    • R Data
  • About Me
  • MattData Blog
  • Resources
  • Contact Me

MattData Blog

MongoDB on Ambari

August 29, 2019 matthughes72 Leave a comment

Realistic. Mongo does NOT come with Ambari.  And yes, it is  a pain in the ass to install.  Just trust…

Continue Reading →

You Have to Get Ambari Installed Locally or Just Kill Yourself

August 22, 2019 matthughes72 Leave a comment

When I started in Big Data I figured that using AWS was a great idea.  Hell, it was free. And…

Continue Reading →

Getting F*ing Drill on Ambari

August 9, 2019 matthughes72 Leave a comment

Me. Drill is a fun query layer that is VERY easy to use but also NOT the easiest thing to…

Continue Reading →

HBase and Pig and Titanic

August 1, 2019 matthughes72 Leave a comment

Since NoSQL is the future of humanity and will save the Universe, I’ve thrown together this quick tutorial on how…

Continue Reading →

Some Useful (and Simple) PySpark Functions

July 20, 2019 matthughes72 1 Comment

I’ve been to Spark and back.  But I did leave some of my soul. According to Apache, Spark was developed…

Continue Reading →

How to Start a New PySpark Job

July 9, 2019 matthughes72 Leave a comment

I’ve been to Spark and back.  But I did leave some of my soul. According to Apache, Spark was developed…

Continue Reading →

Quick Correlation Plot with Seaborn

July 1, 2019 matthughes72 Leave a comment

Correlation is the simplest way to start comparing features to see which data points may line up with other data…

Continue Reading →

Class Imbalance

June 17, 2019 matthughes72 Leave a comment

This is an important concept when performing any kind of predictive analysis.  All it means is that it’s imperative that…

Continue Reading →

Avocados! … and Plotly and DASH

June 2, 2019 matthughes72 Leave a comment

Hello my pretties… I discovered Plotly and DASH, so here is my first attempt. https://mattocado.herokuapp.com/ As a colleague pointed out,…

Continue Reading →

World War 2 Data Set

May 14, 2019 matthughes72 Leave a comment

Since I’m a total WW2 nerd, and obviously a data one too, I found this great dataset that lists all…

Continue Reading →

Post navigation

Page 1 of 4
1 2 … 4 Next →
Copyright © 2019 Data Science w Matt Hughes
A Very Sick Company