How to make graph animations with R

In this post, I will show how to combine a series of plots into a GIF file using the animation package in R.

In this example, I will create a GIF of horizontal bar plots showing how the proportion of men and women have changed in various occupation categories in Canada. Although I will show how to generate the results, I will leave the analysis for another post – this post is just the Data and R code.

Data:

I got the data from StatsCan (here’s a link).  Also, I attached an excel version of the dataset I used.

cansim121209114410230608a

Methodology:

The below bit of code, brings in the data, does some cleaning up, and calculates the values to be graphed.  If you download the data, the only line of code you should have to change is the directory and file extension.

library(ggplot2)
library(reshape)
library(animation)

#import data, specify that "x" values contained in csv should be read as NA
directory <-  "C:/Users/Business/Dropbox/Economics Research/Blog/Occupations/National Occupational Classification/"
pData <- read.csv(paste(directory,"cansim121209114410230608a.csv", sep=""), na.strings="x")

#change column names
pData <- rename(pData, c("Ref_Date" = "Date", "GEOGRAPHY" = "Geography", "CHARACTERISTICS" = "Type.Of.Work",
"OCCUPATION" = "Occupation", "SEX" = "Gender"))

#change the order of the factors. we do this so that the faceted plot will start with the national data, and then
#the provinces from east to west
pData$Geography <- factor(pData$Geography, levels = c("Canada", "Newfoundland and Labrador", "Nova Scotia", "Prince Edward Island",
"New Brunswick", "Quebec", "Ontario","Manitoba", "Saskatchewan", "Alberta", "British Columbia"))

#change the order of the occupation factors. we do this so that the faceted plot will start with the total occupatios data
pData$Occupation <- factor(pData$Occupation, levels = c("Total, all occupations", "Art, culture, recreation and sport",
"Business, finance and administrative", "Health", "Management", "Natural and applied sciences and related occupations",
"Occupations unique to primary industry","Occupations unique to processing, manufacturing and utilities",
"Sales and service", "Social science, education, government service and religion",
"Trades, transport and equipment operators and related occupations"))

#create 3 subsets of data
bothGendersData<-pData[pData$Gender=="Both sexes",]
maleData<-pData[pData$Gender=="Males",]
femaleData<-pData[pData$Gender=="Females",]

#using the above 3 subsets of data, determine the percent of workforce for men and women
#for the various occuaptions
maleData$ValuePercent <- maleData$Value/(maleData$Value + femaleData$Value)
femaleData$ValuePercent <- femaleData$Value/(maleData$Value + femaleData$Value)

#combine the male and female data into a single, new dataset which will be used to create the graphs
pData2 <- rbind(femaleData,maleData)

#I found the code between the two hash lines at: http://ryouready.wordpress.com/2009/02/17/r-good-practice-adding-footnotes-to-graphics/
#it's a lovely bit of code that creates a custom function to add a footer to the plots
##############################################################################################################
source <- "Source: StatsCan"
author <- "PosNorm"
footnote <- paste(source, format(Sys.time(), "%d %b %Y"),
                  author, sep=" / ")

# default footnote is today's date, cex=.7 (size) and color
# is a kind of grey

makeFootnote <- function(footnoteText=
                         format(Sys.time(), "%d %b %Y"),
                         size= 1, color= grey(.5))
{
   require(grid)
   pushViewport(viewport())
   grid.text(label= footnoteText ,
             x = unit(1,"npc") - unit(2, "mm"),
             y= unit(2, "mm"),
             just=c("right", "bottom"),
             gp=gpar(cex= size, col=color))
   popViewport()
}
##############################################################################################################

This next bit of code creates a function that will help to make the axis labels look less cluttered:

#i got this function off stackoverflow, for the axis labels in the graph, it keeps
#the values to a maximum of 2 decimals.  i used this because the axis labels looked
#cluttered
fmt <- function(){
f<- function(x) as.character(round(x,2))
}

Now we’re ready to start building our GIF.  For this code to work, you will need to download and install ImageMagick (link).  We could build our GIF without the animation package. In this case, we would create a loop to generate the desired graphs, and then use ImageMagick separately to complete the GIF.  With the animation package, we’re able to do the whole thing from within the R console. Instead of making a loop, we first create a function that will generate the graph:

#to make the GIF, we first need to create the series of plots that we wish to loop together
#it's possible to make a loop that creates the plots and then use a program like ImageMagick
#to combine them all together.
#with the animation package, we can do both of these steps at once.  before running the below code,
#it is important to install ImageMagick.
#instead of making a loop to create the plots, we create a function that creates the plot for a given year
#we then pass the function to the saveGIF method which will loop through the years and create the GIF.

barplotGIF <- function(t) { #create a function that will create a barplot for a given year t
  graphTitle <- paste("Proportion of men and women in various occupation categories: ", t, sep=" ") #generates string for graph title
  p <-  ggplot(data=subset(pData2, Date==t)) +  #create a ggplot object with data for year t
        geom_bar(stat="identity",aes(x=Occupation,y=ValuePercent,fill=Gender))+ #add the bar plot geom, the stat=identity uses the values in the data instead of counting the number of values
        facet_wrap( ~ Geography, ncol=4) + #creates a seperate plot for each Geography varable
        geom_hline(yintercept=0.5, linetype="dashed", size=0.7)+ #add a line at the 0.5 mark
        theme(axis.title.x = element_blank()) +   #removes x axis title
        theme(axis.title.y = element_blank()) +    #removes y axis title
        theme(plot.title = element_text(face = "bold")) + #makes the plot title bold
        scale_y_continuous(labels=fmt()) + #specify lables, note that the labels arguement accepts a function for a value.
        ggtitle(graphTitle) + #add the graph title
        coord_flip() #flip the coordinates, this changes the plot from a vertical bar plot to a horizontal bar plot
  print(p)
  makeFootnote(footnote)  #adds the footnote through the above custom function
}

The above function accepts variable t (which will be a date) and filters the data for that date, and generates the barplot.  We can then generate the GIF using one line of code:

#this is the line that creates the GIF
#lapply takes the dates, and applies them to our barplotGIF function and returns a list
#interval = 1 sets a 1 second delay between plots in the the GIF
#movie.name gives the name of the file, NOTE I found it very diffcult to specify the directory path without changing the working directory, - so i left it
#the R output shows the location of the files, usually stored in a temporary folder.
#ani. width and height specify the dimensions of the individual png files
saveGIF(lapply(min(pData2$Date):max(pData2$Date),barplotGIF), interval = 1, movie.name = "barplot.gif", ani.width = 1300, ani.height =750)

The above code applies a vector of dates to our custom function.  The lapply() method is interpreted by saveGIF() as the expression to generate the animation.  The interval variable sets the delay between plots in the GIF, and the width and height variables set the size of the animation (the default is 480px).

After all that, you should get the below GIF (click for larger version):

ProportionWomenMen

Although I’m not new to R, this is my first time attempting animations, if anyone more seasoned has any advice, feel free to leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s