After working in data science for a while there is one concept that I began to take for granted; Vectorization.

The term Vectorization comes from R. It can have other names but I like Vectorization because it sounds cool.

In a normal programming language, if you want to add two arrays together it can be quite a grind.

Let’s say you want to do this in regular ‘ole Python (or C or any other ‘normal’ language), you would have to build an elaborate series of for-loops, like this:

d = [1,2,2,3,4] e = [4,5,4,6,4] f = [] for x in range(0, len(d)): f.append(d[x]*e[x]) print(f) [4, 10, 8, 18, 16]

That’s all fine and good, but now imagine doing that with 2D matrices. Or multiple arrays. Or performing even more complex math on any of them.

In a Vector Based Language, you don’t have to go through that whole rigamarole. Instead you can just do this:

d = np.array([1,2,2,3,4]) e = np.array([4,5,4,6,4]) print (d*e) [4, 10, 8, 18, 16]

Vector Based Languages let you perform mathematical functions on entire lists or matrices as though they were single objects.

d = np.array([[1,2,2,3,4], [3,2,8,7,12], [11,21,26,3,43]]) e = np.array([[4,5,4,6,4], [13,21,21,31,24], [51,12,22,31,46]]) print (d*e) [[ 4 10 8 18 16] [ 39 42 168 217 288] [ 561 252 572 93 1978]]

With a vectorized language, like R, or python with numpy, you can do these types of calculations simply and without concern about the underbelly of the process.

Thank Thor for this technology. Staring at endless nested for-loops would cause me to pull my eyeballs out.

Again, I completely lost any appreciation for this important construct because getting knee deep in numpy or R will allow you to do that. Just wait until you get back to your C programming! Then you'll appreciate it...