So I get a data file, CSV, text, etc…. and my usual first step is to stare at the file in my Downloads folder for a few minutes. Then maybe change the file name. Then go make some coffee. Then come back and read the name of the file again. Maybe change it back.
I’ll open up some IDE and make a new python file. Save it. Stare at that. Import some libraries… that name sucks I should change it.
CNN is on, I should probably see what's happening in the world...
My point is that it’s hard to start. And the best way to start is just to start. Here’s a good list of things to put in your py file to at least get a handle on what you’re dealing with and hopefully get some juices flowing.
You might not need them all, but you can always remove them later. This tactic is probably bad form, but I don't care, it helps...
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.datasets import load_boston from sklearn import metrics
Import Your Data
Without importing your data you're bound to have a tough time figuring out what you're dealing with. And for some reason, once I've inaugurated a Panadas dataset I feel like I'm on the way...
customers = pd.read_csv("Ecommerce Customers.csv")
Get Some Visualizations Going
I like to start with the very basics. Just these 4 lines will give you a tonne of information about your data and where you should start probing...
print(pdf.head()) print(pdf.info()) print(pdf.describe()) print(pdf.columns) print(pdf.shape) print(pdf.dtypes)
Print Some Nice Plots
Everyone likes a good visualization. It gives you a quick feeling of accomplishment and a head start toward finding gabs, dead-ends, etc...
snsData = sns.load_dataset('tips') print(snsData.head()) print(sns.pairplot(snsData)) print(sns.distplot(snsData['some_column'])) sns.heatmap(snsData.corr(), annot=True)
Although none of these things are the answer to your underlying problem, they are a sure-fire way to get the coffee brewed, the TV turned off and your project underway.