Data collection in a research methods class

One of the discussion boards sponsored by the American Statistical Association had a question about data collection exercises in a research methods class. The initial question was how it might be done and how you could detect if a student was cheating by just making up some numbers. Several people raised the issue of IRB approval of research and I decided to chime in with a response.

If you do a google search on words:

irb approval of student projects

you will get a wide range of resources that all say pretty much the same thing. A data collection activity that is done solely for pedagogical purposes does not meet the official definition of research and thus does not require IRB review. If you are giving people an opportunity to practice research skills, but you don’t intend to produce generalizable knowledge from that work, it is not research.

For what it is worth, you should still teach students about the problems with intrusive questions of a sensitive nature. You should expect students to brief their “research subjects” about the purpose of the research and respect anyone’s request not to participate. You should ask them to not include any personal identifiers in their report. Work that does not require IRB review should still be collected in a way that respects people and their need for autonomy and privacy.

But a better answer, perhaps, is that sampling of publicly available information is exempt from IRB review, even when it is done with the intention of producing generalizable knowledge. So you could compare the review ratings of different products on Yelp. you could compute the average year of publication in the bibliographies of various peer-reviewed articles. I’ve done this for some book reviews I’ve done and concluded in one case that the book’s bibliography was very stale. You could compare word counts for a relatively chatty comic strip like “For Better or Worse” and compare it to a terse comic strip like Lio. You could estimate the proportion of magazine covers that have a picture of Donald Trump on them over time. You could count the number of tweets that say “Kansas City Chiefs Super Bowl Champions” from two different sports writers. You could compare the frequency that one of the “Twilight” movies appears in the TV listings compared to one of the “Harry Potter” movies. The good news about sampling from publicly available information is that you could (with a bit of effort) reproduce part or all of their work to detect cheating, which was a concern raised in the original email.

An earlier version is here.