Confusion Matrix – Confused Yet?

Dumb Kid

Bahaha.  I love the name of this thing.  I’m sure the stats world is pulling a fast one.

This is actually not super complicated, but for some reason I can never remember which is a Type 1 Error and which is a Type 2 Error.  I suspect suspect it's because of all the fentanyl my mother did while she was pregnant.

Just joking.  She's a lovely woman and never went any further than good old fashioned heroin.

If you remember anything from Uni-stats, you might remember that there are 4 types of outcomes when doing an experiment.  Two of these are correct outcomes, two are errors.

True Positive: This is a correct outcome.  It predicts that something is TRUE when in fact it actually is TRUE.  i.e. “A cancer test comes back positive and the cancer is there”.

True Negative:  This is also a correct outcome.  It’s when you predict something is FALSE when in fact it is actually FALSE.  i.e. “A cancer test comes back negative, and there is no cancer there”.

False Positive:  This is an error.  We predict something is TRUE when in fact it is really FALSE.  i.e. “A cancer test comes back positive, when in fact the person doesn’t have cancer.”  Also called a Type 1 Error.

False Negative: This is also an error.  It’s when we predict something is FALSE when it actually is TRUE.  i.e. “A cancer test comes back negative, but the person actually does have cancer.  Also called a Type 2 Error.

A confusion matrix is simply a way to visualize the results of an experiment.  Example:

Out of 165 test results we’ve had 150 successful ones.  50 of them were predicted to be NO and were in fact NO. 100 of them were predicted to be YES and were in fact YES.  

We’ve also had 15 Errors; 10 were Type 1 Errors and 5 were Type 2 Errors.

From these numbers you can start to calculate a whole slew of statistics.  I won’t go over ALL of them, (you can look that up), but here are a couple:

Accuracy: The number of correct results divided by the total number of results.  In this example 150/165 = 0.91 accuracy.

Misclassification Rate: Essentially this is the just the exact opposite of accuracy.  15/165 = 0.09 misclassification. (You could also just figure out that this is 1 minus the accuracy rate.).

As I said, there are a slew of other stats you can come up with from a Confusion Matrix, and they are all as simple to calculate as our example.

 

Leave a Reply

Your email address will not be published. Required fields are marked *