After taking some time away from the blog with our first child, and preparing materials for a course, this blog is starting back up again.

In all the posts I’ve made, I’ve used R for data analysis rather than Excel. There are a lot of people who don’t like using Excel, but I actually like Excel quite a bit, and I use it all the time for financial models. In my opinion, for any sort of accounting-like modelling where you want to display several interconnected tables of data simultaneously, Excel is the way to go. However, the same attributes that make Excel easy-to-use and flexible also make it a very weak tool for analyzing data.

Excel is a workbench, R is a process. With Excel, you have your tools and data and you manually put everything together. Each cell is treated independently, logic can flow in any direction, and nobody sees how it was put together – they just see the final product. Modifications to an Excel project require someone to take it apart, make the modification, and put it back together. Mistakes are easy to make, and hard to detect.

With R, the user gives the computer a set of instructions, and the computer tries to carry out the instructions. Unless the analysis is extremely simple, it is much faster to write out a set of instructions rather than doing everything yourself in a spreadsheet. In addition to being faster, R is also more reliable. R scripts have a very linear logic – “first do this, and then this, and then this…”. Well-commented R code will be much more transparent than an Excel sheet, this makes it is easier to detect logic errors. Finally, with all of R’s libraries (over 10,000), R give you tools to do analysis and problem solving that just is not possible with Excel.

There are other data science programming languages out there – why use R over another language like Python? That answer depends on many factors and I’ll save that for another post.