Statistics course tend to focus on teaching statistics. However, as students soon learn when they start analysing their Masters or PhD thesis data, statistical analysis is typically only a minor component of any data analysis project. Data wrangling – importing, cleaning, and reformatting – often takes far longer and needs far more code than the statistical analysis and is generally little covered in courses.
The reason for this, I think, is that there are a relatively small number of commonly used statistical methods that need teaching, but an infinite number of weird ways to format data. While the development of the
tidyverse family of R packages has helped greatly with a teachable set of tools for manipulating “tidy” data, one still has make the data “tidy”.
Rklubben is BioCEED supported supplementary instruction for R. It focuses on reproducible data management and generating publication quality figures in R, with as little statistics as we can get away with. It encourages peer-to-peer learning: whenever possible I put participants who have previously solved problems together with those with similar problems. Sometimes I play the role of peer; with 16805 R packages available on CRAN (the main repository) there is always a new package to learn how to use.
Rklubben participants come from across the department and beyond, and undertake a wide variety of data analyses (hence the need for so many of the 16805 packages). This semester, they have, for example, written an R package to import data with automatic selection of the column separator; made a made map of fishing trips around East Timor using the relatively new
sf package (which I not used before); processed the output of a fisheries model; and debugged lots of problems.
Of course, the main difference this semester from previous ones is that most of the sessions have been done over Zoom rather than in person. Apart from the inevitable problems with shaky internet connections, and poor sound quality, it has worked remarkably well. I think this is because the participants are strongly motivated. They have a problem and want help to find a solution. They cannot lurk. They have to interact, describe the problem and share their screen.
A bonus of working on Zoom is that I don’t need to squint at the participants’ dim and dusty laptop screens to read their code. The downside to working on-line is that breakout-rooms are difficult to manage, and I cannot keep an ear open for problems.
Rklubben is on Zoom on Fridays from 1400-1600 (or when we’re finished), hoping to move back to Tunet sometime next semester. Members of the BIO are prioritised, but others are welcome. Please email email@example.com for the Zoom link.