R vs Excel

After taking some time away from the blog with our first child, and preparing materials for a course, this blog is starting back up again.

In all the posts I’ve made, I’ve used R for data analysis rather than Excel.  There are a lot of people who don’t like using Excel, but I actually like Excel quite a bit, and I use it all the time for financial models.  In my opinion, for any sort of accounting-like modelling where you want to display several interconnected tables of data simultaneously, Excel is the way to go.  However, the same attributes that make Excel easy-to-use and flexible also make it a very weak tool for analyzing data.

Excel is a workbench, R is a process.  With Excel, you have your tools and data and you manually put everything together.  Each cell is treated independently, logic can flow in any direction, and nobody sees how it was put together – they just see the final product.  Modifications to an Excel project require someone to take it apart, make the modification, and put it back together.  Mistakes are easy to make, and hard to detect.

With R, the user gives the computer a set of instructions, and the computer tries to carry out the instructions.  Unless the analysis is extremely simple, it is much faster to write out a set of instructions rather than doing everything yourself in a spreadsheet.  In addition to being faster, R is also more reliable.  R scripts have a very linear logic – “first do this, and then this, and then this…”.  Well-commented R code will be much more transparent than an Excel sheet, this makes it is easier to detect logic errors.  Finally, with all of R’s libraries (over 10,000), R give you tools to do analysis and problem solving that just is not possible with Excel.

There are other data science programming languages out there – why use R over another language like Python?  That answer depends on many factors and I’ll save that for another post.




How to make graph animations with R

In this post, I will show how to combine a series of plots into a GIF file using the animation package in R.

In this example, I will create a GIF of horizontal bar plots showing how the proportion of men and women have changed in various occupation categories in Canada. Although I will show how to generate the results, I will leave the analysis for another post – this post is just the Data and R code.

Continue reading

R Code: why use ggplot2 instead of plot()?

R is a fairly intuitive language.  It generally does not take long to learn the different variable types, how to import data, the control statements, etc.  However, when it comes to making graphics, there is so much syntax involved that it can be a bit overwhelming.

What is ggplot2? It is a package for R that contains tools for producing various graphics.  I’ll go further into what it is, and how to use it in another post.  The purpose of this post is to show how using the plot() function in R can get unnecessarily complicated, and that the ggplot2 package is a superior alternative.

Continue reading