You can tell a Language is old when its name is un-googleable; "R", "C#", "Matt Laur". 

 

Once again, I am in deep need of a quick, simple cheat sheet for referencing and attempting to make sense of the R data ecosystem.

Amazingly, in a lot of ways it turns out to be quite a bit easier than in Python.  You just need to twist your brain into the shape of a freshly baked loaf of Vienna bread first.

 

Vectors

Everything in R is a Vector.  Everything.  Even a single character or number is a Vector.  Put that into the back of your brain and keep it there forever.

However, the most common usage of a Vector is in an object that is EXACTLY the same as an array in any other language.

myVector <- c(3,4,5,6)

In the above example, we've just created a new Vector containing 4 numbers.  The "c" in the front of the list is actually a function called Combine, and you will use it every time you create a Vector.

To retrieve an item from a Vector it's the same as many other languages:

myDigit <- myVector[2]

Some things to note about Vectors:

  • They start at 1.  Not zero, like in every other language on Earth.
  • You can only use it to contain objects of the SAME type.  If you mix types (numbers and characters) R will attempt to figure out what you're doing.  Usually that means converting all your numbers to characters.
  • You can optionally "name" the objects in your vector, you can then retrieve an item using it's name:
myDigit <- myVector["theName"]

Matrices

Matrices are 2 dimenstional versions of Vectors.  In fact the 2nd dimension is just there because you're smushing other Vectors together (into a Matrix).

Because a Matrix is 2 dimenstional, you will retrieve an item from it with the targetted 2 dimenstions:

myMatrix[3,4]

3 = the Row

4 = the Column

If you want to retrieve an entire row, just leave the column out:

myMatrix[3,]

.. or an entire column:

myMatrix[,4]

One weird thing to note, if you get an entire row, column or single object, your result will become a Vector, as described above.  If you want to keep your result as a Matrix (even though it may only be 1 dimensional), add the "drop" argument to your call:

myMatrix[3, ,drop=F]

Often you may have a bunch of Vectors that you want to smush up to make a Matrix, there are two ways to do this.

By row:

myMatrix <- rbind(myVector01, myVector02)

Or by column:

myMatrix <- rbind(myVector01, myVector02)

These two functions confuse me every time because I can't remember if I want to bind by row or column.  I think of it this way. If you pile up all your vectors on top of each other, rbind will glue them together as seen (vertically).  cbind on the other hand, will turn all your Vectors clockwise, then glue them together. (Clockwise.  Cbind.  Remember "C").

 

Dataframes

On the surface a Dataframe looks a lot like a Matrix.  There are some key differences, and very key functionality differences.

  • It can contain different data types.
  • You will use it whenever you import data.
  • You can retrieve items from your dataframe by the value of a particular cell using the $ sign.
newData <- myDataframe$myColumnName[myRow#]
  • You can filter or perform operations with this notation (on entire columns if wanted):
newData <- myDataframe$myColumnName *myDataframe$myColumnName

newData <- myDataframe$myColumnName < 5

Factors

When you import data into a dataframe, R will automatically look at each column to try and determine if it contains categorized data ("red", "blue", "green", etc.).  If it finds data like this it will create something called Factors on this column behind the scenes.  Each Factor will contain all the unique values in the column. So one Factor will contain all the "blue"s, another all the "green"s etc.  If you call the str(myDataframe) it will give you a count of the number of Factors in each column.  Very useful to come..

 

Wanna find out about me?

Wanna find out about me?