Coefficiently Confused

Statistics has an enduring ability to brand itself with interminable, confusing and utterly forgettable terminology.

Some Gluons or Something

I used to castigate physics for the same thing but at least they have an excuse; they're attempting to label billions of indescribable stuffs (i lay off Entomologists for the same reason).

Statisticians don’t have “billions of stuffs” to label; maybe they have… a hundred?

For example; Linear Regression.  There’s nothing regressive about it.  There’s a line, sure, so it’s sort of linear, but… ah forget it we’ll get to that some other time...

So today, my pretties, we'll discuss the Coefficient.  An equally random term that simply means if you add 1 unit of something to x, y will change this many units.

For example, I just ran a job on some website traffic that was trying to track how much $$ people spent depending on how much time they spent on the various platforms:

avg. session length 25.957178
time on app 38.697974
time on website 0.039317
length of membership 61.299257

I'm the Boss.

... so in this example for every unit of time a user spends on the app, they will spend an additional $38.69 (annually) at the shop.  For every unit of time spent on the website the average user will spend just $0.03.  (I can't remember what the TIME unit is in this example, but it doesn't matter the point is made).

The coefficient is definitely not the be-all-end-all stat but it's a great place to start any investigation.  It gives you a pretty good idea of where to start looking for further trends and what are dead ends.

House. In Boston.

For more detail about doing this with Python check here.  It's using SciKit's built in Boston Housing data from 1970.  It's quite concise and the dude seems like a bit of a punk, so I'm sold.



Leave a Reply

Your email address will not be published. Required fields are marked *