R Code: ggplot2 – an introduction to the concepts

I hope that my previous post on this subject was convincing with respect to ggplot2 being the superior choice for graphics in R.  In this post, I will layout the concepts behind the ggplot2 package.

The reason why I’m making these posts on the ggplot2 package is that when I was learning R, I found a lot of examples, but very few resources for learning the package. Alright, let’s get started…

First off, the “gg” in the name stands for “grammar of graphics”.  The package is based on the book: The Grammar of Graphics: Statistics and Computing by Leland Wilkinson. What is the grammar of graphics?  It is a unique set of terms and concepts that are used to build graphs.  Let’s go through them.

Concept 1: Aesthetics

In all the documentation about ggplot2, you’ll see the word ‘aesthetic’ used all over the place.  An aesthetic is a property that affects how your data is visually displayed.  The location, the shape, the colour are all aesthetic attributes.  It is a very general term.  Why is it important?  Because the fundamental concept behind this package is to think of a plot as the graphic representation of a map that connects your data with various aesthetic attributes.

Let’s illustrate by reproducing a graph we did in the previous post.  Instead of using the qplot() function, which has some things going on under the hood, we’ll use the more generic ggplot() function.  First, let’s setup our sample data as we did before:

library(ggplot2)
#use the 'diamonds' dataset, comes with the ggplot2 package
rawData <- diamonds
set.seed(1000)
sampleData <- diamonds[sample(nrow(diamonds),200),]

Then, we create a plot object using the generic ggplot() function:

p <- ggplot(data=sampleData, aes(x = carat,y = price, colour = clarity, shape = cut))

The plot object (p) has two components: the data (note: the data must be a data frame), and the aesthetic mapping.  The data is our sampleData, and the aesthetic mapping is created with the aes() function.  The arguments for the aes() function are the aesthetic elements we want on the graph.  We want the location aesthetic to be given by the carat and price variables, we want the colour aesthetic to be given by clarity variable, and we the shape aesthetic to be given by the cut variable.  It’s important to get comfortable with this way of thinking.

After entering the above code, you’ll notice that you don’t actually end up with a graph.  This brings us to the next important concept:

Concept 2: Layering

In simple terms, a layer creates what you see.  A layer is typically made up of data, aesthetic mapping (aes), a geometric object (geom), and sometimes a  statistical transformation (stat) or position adjustment.  Since we already have the data and aesthetic mapping in our plot object, all we need to do is add a geometric object to give the plot a layer.  A geom is what you’re actually going to see.  The geom is usually the type of graph you want to make, in our case, a scatterplot.  Let’s add a layer to the plot object:

p <- p + layer(geom = "point")

Notice that we use the ‘+’ symbol to add a layer to the plot.  Now we have the below graph, which is the same as what was created in the previous post using the qplot() function (note: i did change the resolution on this image, so it may look slightly different).

Fig1

Layers can be stored as variables.  The below code would work equally well:

q <- layer(geom = "point")
p <- p + q

Storing layers as variables often helps to keep the code clean and avoid duplication.

Summary:

With the ggplot2 package, we create plot objects by first specifying the data, then mapping the variables that we’re interested in to aesthetic attributes, and finally by representing the map by adding layers to the plot object.  The layers specify the geometric objects, statistical transformations, and position adjustments used to create the graphic.  Layers can also add/modify the data and the aesthetic map of the plot object.  A plot object can have several layers.  This “layered approach” to graphing allows you to build graphs in an iterative manner that both increases your flexibility and simplifies your coding.

In my next post on the subject, I’ll get into some of the finer concepts with this package, but I hope that this gives you sufficient tools to get going with the ggplot2 package.

Sources:

ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s