Class Imbalance

source site definition essay american dream leadership essay in tamil generic viagra shipped from usa example of outline for informative essay persuasive essays about uniforms in schools source site carters typewriter ribbon and carbon paper cabinet box antique cialis motrin apa college paper format sample liquid viagra in amarillo abc canadian pharmacy viagra on cnn advertisement go problem faced by teenagers essay writing a cute letter to your girlfriend sat essay scoring conversion bridge construction resume how to write and expository essay dissertation proposal mba precio del cialis venezuela aquarium plant essay cheap generic viagra com compare writing paper website essay on environmental degradation eker hastalar cialis kullanabilir mi essay sleepwalking This is an important concept when performing any kind of predictive analysis.  All it means is that it’s imperative that the variable you are attempting to predict has decent balance between binary values.

So if you’re attempting to predict, let’s say, cancer, your data must have a fair balance between positive cancer results and negative results.  If your data has 10 positive results and a million negatives, you will probably not be able to form a useful algorithm.

Luckily, I found this little function that will go through your data and give you the balance in your data.

def print_dx_perc(data_frame, col):
   dx_vals = data_frame[col].value_counts()
   dx_vals = dx_vals.reset_index()
   f = lambda x, y: 100 * (x / sum(y))
   for i in range(0, len(dx)):
      print('{0} accounts for {1:.2f}% of the diagnosis class'.format(dx[i], f(dx_vals[col].iloc[i],

print_dx_perc(breast_cancer, 'diagnosis')


Leave a Reply

Your email address will not be published. Required fields are marked *