Tag Archives: R

A great idea: R Markdown for Undergrads

A recently published paper by Baumer et al (2014) caught my eye today (HT to Bruce Caron).  I wanted to share it here because I thought it was cool and also had a few comments to make about some of the issues the authors raised.

First, a bit about the paper.  Partly in response to all the media attention to the crisis in reproducibility in science (e.g. Nature) Baumer and colleagues made some changes to introductory statistics classes at Duke, Smith, and Amherst.  The primary change was to require the use of R Markdown for all homework.  RStudio was the editor they used and it appears any cutting and pasting of code, figures, etc. was not allowed.  They conducted a survey of the students early in the class and after the class.  The end result was that students preferred using R Markdown over the typical mode of cut and paste.  They may have grumbled a bit about learning R Markdown but the benefits were obvious to them.

Getting these students using R Markdown and creating reproducible homework assignments is a fantastic thing, in my opinion.  I have worked with younger researchers (although not undegrads) and with older ones.  Convincing younger researchers of the benefits of R Markdown and the general concept of reproducibility is pretty easy.  To put it bluntly, the older researchers are a pain…  There are ALWAYS long conversations (er, arguments) about why their method is not any different than a reproducible one, why their method it is better, etc.  I suppose the “old dog, new tricks” is apropos.  The moral of the story is that teaching undergrads reproducibility and Open Science in general will have many long term benefits and what Baumer and colleagues have done should be more widely adopted.

Aside from my being a big fan of what they did, I have one response to an issue they raised in the paper.  On pages 16-17 the authors discuss the need to collaborate on R Markdown documents and suggest Dropbox as a possible solution. While that might work, I think a better option is to use Git and Github.  This is, I think a great opportunity to introduce version control early on to the students and it fits right inline with the open science and reproducibility theme of the authors efforts.

So in short, what Baumer and colleagues are doing is great. It would be FANTASTIC if they added Git/Github to the mix.

An R function to download shapefiles

This post is a follow up from my latest Things I Forget post on reading in shapefiles.  That post assumed that you already had access to all the relevant files (e.g. .shp, .shx, .prj, .dbf, etc.).  A task that I routinely need to do is locate shapefiles on a website, grab those files, and read them in.  Instead of having to do this manually I wrote a function a while back to take care of this task.  The function simply requires shape_url, a link to the location of the files, and layer, the name of the shapefile. Currently the layer should not contain the .shp extension. There is also an optional parameter outfile that can be used to create a different name for the downloaded files.

I haven’t spent much time on error handling. For instance this function assumes you already have rgdal and sp installed and loaded. If you try to use this and get an error, let me know in the comments and I’ll try to fix it. Or better yet suggest a change and I’ll throw it in!

So here is the function.

download.shapefile<-function(shape_url,layer,outfile=layer)
{
  #written by: jw hollister
  #Oct 10, 2012

  #set-up/clean-up variables
  if(length(grep("/$",shape_url))==0)
  {
    shape_url<-paste(shape_url,"/",sep="")
  }
  #creates vector of all possible shapefile extensions
  shapefile_ext<-c(".shp",".shx",".dbf",".prj",".sbn",".sbx",
                   ".shp.xml",".fbn",".fbx",".ain",".aih",".ixs",
                   ".mxs",".atx",".cpg")

  #Check which shapefile files exist
  if(require(RCurl))
  {
    xurl<-getURL(shape_url)
    xlogic<-NULL
    for(i in paste(layer,shapefile_ext,sep=""))
    {
      xlogic<-c(xlogic,grepl(i,xurl))
    }

    #Set-up list of shapefiles to download
    shapefiles<-paste(shape_url,layer,shapefile_ext,sep="")[xlogic]
    #Set-up output file names
    outfiles<-paste(outfile,shapefile_ext,sep="")[xlogic]   }
    #Download all shapefiles
    if(sum(xlogic)>0)
    {
      for(i in 1:length(shapefiles))
      {
        download.file(shapefiles[i],outfiles[i],
                      method="auto",mode="wb")
      }
      } else
      {
      stop("An Error has occured with the input URL
            or name of shapefile")
    }
}

And now to prove it works I can do something like the following:

#Download the NH State Boundaries
download.shapefile("ftp://ftp.granit.sr.unh.edu/pub/GRANIT_Data/Vector_Data/Administrative_and_Political_Boundaries/d-nhsenatedists/2012",
                   "NHSenateDists2012")
#Read shapefiles in SpatialPolygonsDataFrame
NHBnd<-readOGR(".","NHSenateDists2012")
#Plot it
plot(NHBnd)
New Hampshire State Boundary and Senate District

New Hampshire State Boundary and Senate District

Lastly, there are some other approaches for tackling related problems listed below.

claimtoken-5249c78db4c7a

Things I Forget: Reading a Shapefile in R with readOGR

One of the more common ways that I read vector data into R is via shapefiles.  I tend to use these partly becuase of my own sordid past with Arc/INFO, ArcView and ArcGIS and partly due to their ubiquity.  In any event I have found the R package, rgdal, indespensible for this.  One of the workhorse functions for pulling in vector data is readOGR().  It has two required parameters dsn and layer. The part I never remember is how these relate to shapefiles. There is nothing especially tricky about it, I just tend to forget what the dsn is and what the layer is. In short the dsn is the directory (without a trailing backslash) and the layer is the shapefile name without the .shp.

So, here’s the actual code so I don’t have to look it up again.

If the shapefile you are reading is in your current working directory the dsn refers simply to that directory. So all you need is simply a “.”. The layer is the name of shapefile without an extension. So it would look something like:

myShapeInR<-readOGR(".","myShapeFile")

Now, if that file resides elsewhere, the trick is to remember what the heck dsn refers to. Again it is simply the directory where the shapefile resides. So if I had a shapefile in a place like C:/data I would use the command like:

myShapeInR<-readOGR("C:/data","myShapeFile")

There, it is really quite simple, yet I always mess it up. Not any more.

I hope.

First post, and its a doozy!

Well, not really a doozy.  Just something nice and slow to get me going.

So, seeing as I intend to post stuff about R along with the other things, I thought it best to understand how all those great R bloggers embed the highlighted R code into their WordPress blogs.  As it turns out, I am not the first to do so.  Head over to the R Statistics Blog for the details.

So, does it work?

Yes!

helloWorld <- function(x)
{
    print(x)
}
helloWorld("Hello, World!")