Chances are that you have started to hear more about R in the last few years, and you’ve wondered if you should be using it in your work.
In this post I explore the reasons why I now use R for almost all my work with environmental and infrastructure data. We begin with three confessions…
Confession 1: I am not a “natural” programmer, or mathematician. I was never one of those kids who spent the summer locked away in a dark room with a computer. I actually spent lots of time painting, or writing, making bows, arrows and swords out of wood, canoeing / taking tourists up and down the river on punts (I lived in Cambridge) or with various aspects of photography.
Confession 2: I like to do things in the best and easiest way possible.
Confession 3: I am now finding the statistical programming language R very helpful.
This is all my brother’s fault
I’m quite a visual person, so the thought of spending days looking at code on a screen fills me with dread. My default program for numerical analysis was Excel. In fact, (don’t tell anyone) when I was doing my PhD, I used Excel to host some Bayesian models to predict soil type from geology and slope parameters. The models took almost an hour to run each time, and I ran a LOT of models…
This behaviour continued until well into my career. I was working on an infrastructure project, predicting infrastructure failures, (again, in excel). It was, frankly, going nowhere.
In desperation, I picked up the phone and called my brother (a statistician, and a really nice guy to boot) and asked what he would do. Within a week, we had a working model, with some results that blew the client away. And it was done in R. Humm.
This pattern continued for some time:
So, perhaps reluctantly, I came to realise that R might actually improve my science. More importantly it would help my clients. And I might get home on time…
So, what is R?
R is a statistical programming language that enables you to clean, manipulate and analyse data, in a reproducible manner. It is command line (code) driven by default, although there are some GUI (graphical user interfaces) out there.
Here is what R (in a GUI called Rstudio) looks like on my laptop:
4 reasons you should consider getting to grips with R.
If you handle infrastructure data, I’m going to suggest 4 reasons why you should consider using R.
1) R is fast
Ok, (disclaimer) it is NOT fast to learn, but once you get going, you will find it much faster than Excel to doing any sort of intensive data analysis. Its a bit like riding a bike – it takes time to learn, but boy, does it help you go further, faster. R is similar.
2) R connects well with other data sources
You can read in directly from databases (including spatial geodatabases for GIS (mapping) data), websites, FTP feeds, csv files, excel files, text files etc. And this can all be done automatically. This is really, so very important in these days where infrastructure / environmental data comes in so many formats, from so many locations.
3) R code is reproducible
Reproducibility is hugely important in the academic world, but also in industry. If you need to write the same type of report each month, write the code once in R, and then just tweak it each month, if necessary. It also saves looking back through 15 spreadsheets looking for that error that was introduced by a copy/paste/typo 3 days ago. Following on from the point above, if you can script R to read in all your data from all your sources, automatically, this saves you lots of time.
4) R’s outputs can be beautiful
As I said before, I’m quite a visual person. And I much prefer the outputs that come from R than excel. Some of them are even interactive, which is pretty awesome.
What do you use R for?
I’m always learning new ways to use R to improve my science and work. I’d love to hear from you – What do you use R for? Let us know in the comments below!