Category Archives: Research Blogging

A great idea: R Markdown for Undergrads

A recently published paper by Baumer et al (2014) caught my eye today (HT to Bruce Caron).  I wanted to share it here because I thought it was cool and also had a few comments to make about some of the issues the authors raised.

First, a bit about the paper.  Partly in response to all the media attention to the crisis in reproducibility in science (e.g. Nature) Baumer and colleagues made some changes to introductory statistics classes at Duke, Smith, and Amherst.  The primary change was to require the use of R Markdown for all homework.  RStudio was the editor they used and it appears any cutting and pasting of code, figures, etc. was not allowed.  They conducted a survey of the students early in the class and after the class.  The end result was that students preferred using R Markdown over the typical mode of cut and paste.  They may have grumbled a bit about learning R Markdown but the benefits were obvious to them.

Getting these students using R Markdown and creating reproducible homework assignments is a fantastic thing, in my opinion.  I have worked with younger researchers (although not undegrads) and with older ones.  Convincing younger researchers of the benefits of R Markdown and the general concept of reproducibility is pretty easy.  To put it bluntly, the older researchers are a pain…  There are ALWAYS long conversations (er, arguments) about why their method is not any different than a reproducible one, why their method it is better, etc.  I suppose the “old dog, new tricks” is apropos.  The moral of the story is that teaching undergrads reproducibility and Open Science in general will have many long term benefits and what Baumer and colleagues have done should be more widely adopted.

Aside from my being a big fan of what they did, I have one response to an issue they raised in the paper.  On pages 16-17 the authors discuss the need to collaborate on R Markdown documents and suggest Dropbox as a possible solution. While that might work, I think a better option is to use Git and Github.  This is, I think a great opportunity to introduce version control early on to the students and it fits right inline with the open science and reproducibility theme of the authors efforts.

So in short, what Baumer and colleagues are doing is great. It would be FANTASTIC if they added Git/Github to the mix.

SPARROW + Lake Volume + Field Data = PLOS ONE Paper

So this one goes in the category of shameless self-promotion (or shameless promoting of my co-authors).  We have a new paper out in PLOS ONE about how we combined the USGS SPARROW model, modeled lake depth and volume, and field data from the National Lakes Assessment to improve predicitons of TN and TP.

There is no need for me to re-hash this here, you can check out my blog post on EPA’s It All Starts With Science Blog.

Ref:

Milstead WB, Hollister JW, Moore RB, Walker HA (2013) Estimating Summer Nutrient Concentrations in Northeastern Lakes from SPARROW Load Predictions and Modeled Lake Depth and Volume. PLoS ONE 8(11): e81457. doi:10.1371/journal.pone.0081457