Are you an ecologist?
What stats program did you use to analyse your data ten years ago?
How about now?
A recent paper by Justin Touchon and colleagues reviewed changes in the statistics and statistical programs used by ecologists over the past 24 years. It seems that, not only are we publishing more, but classic analysis methods are being replaced by ‘sophisticated modelling frameworks’. Good old ANOVAs and Man-Whitney U tests are dropping out of fashion, while AIC is being used more than 100 times more frequently in ecological papers now than back in 1990.
Furthermore, the open source program R has gone from nowhere in the early 2000s to being the most widely-cited statistical program – mentioned in a third of all articles published in 2013!
So, if you’re an ecologist aboard the R boat, what’s your favourite R package? QAEco is planning a poll to find out. Currently, our favourites include:
Spatial data – analysis, mapping and modelling
raster – nominated by Nick Golding, raster allows you to analyse spatial data and ‘do all the GIS you want (and more)’ in R. raster also interfaces with a host of other GIS software including sp for vector GIS, GDAL for raster file I/O, and rgeos for everything in between. Most importantly, being able to do all your GIS work in R (rather than point-and click software) makes it easy to create automated and reproducible analyses – Nick reckons it has saved him more hours than he could possibly count. To get started with raster and the rest of the R GIS ecosystem, check out @frodsan’s tutorial.
dismo – this is Jane Elith’s favourite because it makes it much easier to run species distribution models efficiently. dismo relies on other modelling packages for fitting models, but allows the typical steps necessary for distribution modelling, and efficient prediction to large rasters. The main author is Robert Hijmans.
ggmap – Saras Windecker’s nomination, ggmap combines the spatial information of mapping programs with the layered grammar of ggplot2. This allows for the production of modular spatial graphics that are easily tweaked to your specifications.
adehabitat (particularly adehabitatLT and adehabitatHR) – Bronwyn Hradsky finds this suite of packages super-useful. Written by Clement Calenge, adehabitat facilitates the analysis of animal movement data, such as relocation data from GPS or VHF collars. The packages make it easy to convert movement data in to trajectories, visualise, error-check and manipulate these data, and analyse home ranges and habitat selection. They also come with a set of very readable vignettes, which provide a great introduction to this field.
dplyr – Elise Gould says that dplyr makes wrangling your data frames a breeze. dplyr is a metaphorical set of ‘pliers’ for wrangling your data frames, to do things like row- or column-wise subsetting, conduct group-wise operations on multiple subsets of data, or merge data frame and matching rows by value rather than position. Using dplyr (rather than base R) means that common data manipulation problems take less code and less mental effort to write. Moreover, much of dplyr’s work is implemented behind the scenes in C++ code, making wrangling larger data frames lightning-fast! To get started with dplyr, have a look at the data wrangling cheatsheet. For more detailed explanations, see the ‘wrangle’ section of Hadley’s forthcoming book, R for Data Science, and read this great explanation of ‘tidy data’.
reshape – Esti Palma finds this package very helpful for data management. It allows the user to summarize, re-configure and re-dimension datasets, using only two functions; melt() and cast(). There are heaps of online tutorials about how to use reshape (and its faster reboot reshape2). Quick-R and Sean Anderson provide two simple options to get anyone started. Both reshape and reshape2 have been developed by Hadley Wickham.
For interfacing between R and externally-compiled code
Rcpp – Jian Yen is a fan, because Rcpp makes easy to run C++ code from R – all you need to do is write a C++ function and run one line of R code. Better still, Rcpp provides extensions to standard C++ syntax, which means that you can write C++ code that looks a lot like R code. This is awesome because many of us will have spent hours staring at a screen waiting for R scripts to finish running. Sometimes we can get around this by writing better R code, but there are times when that won’t be enough. That’s where Rcpp comes in, letting you write and compile C++ functions that target bottlenecks in your code. Hadley Wickham provides an overview and short guide, and the Rcpp website also has a lot of useful information.
jagstools – Gerry Ryan’s favourite. jagstools allows to you work with the Bayesian hierarchical modelling engine JAGS because, as Gerry says, who doesn’t love to Gibbs sample? jagstools takes results object from JAGS via the package R2jags — which are basically complex lists — and returns a simple matrix of the parameters. This makes your results much easier to work with and plug into graphs or other models. jagstools was made by QAEco-logist (and R-extraordinaire!) John Baumgartner. Here JB runs through simple worked examples of how to use jagstools, and what you might use it for.
How about you?
QAEco wants to find out which R packages are most popular among ecologists – but first we need some nominations. Tell us about your favourite R package for ecology or conservation-related data analysis in the comments below, or #favRpackage and tag @Qaecology.
Nominations close 16 September. Back in 2014, we found that Australia’s favourite eucalypt was the Mountain Ash – this new poll is a wee bit geekier!