What is the probability of students cheating?
I found myself asking this question quite literally recently, when 10
of my students in an upper-level undergraduate methods course turned
in identical results for a take-home exercise. Of course, on some
exercises I expect students to produce identical findings, such as
when I ask for the mean and variance of a particular variable. In
this case, however, I had asked students to produce a new random
variable and to summarize its values in a frequency distribution. My
initial reaction was to suspect the students of collaborating on the
exercise (contrary to my syllabus and the assignment's
instructions), though the number of students who produced identical
results—10 out of a class of 30—made me skeptical that so many
students could conspire so effectively. The very nature of the
students made it unlikely that they collaborated together. I teach
at a large public-service university, with students that represent a
broad variety of backgrounds, nationalities, interests, and ages.
The course also is cross-listed among disciplines, so the 10
students included both political science and geography majors. My
review of the names of the 10 students persuaded me that it was
highly unlikely that they cheated. I have found, furthermore, that
many students will not voluntarily work in groups. So how did this
diverse group of students produce an identical “random”
variable?
My investigation of this question took me well beyond issues of
student conduct. To answer the question to my satisfaction, I found
I had to understand how campus computer networks operate and
ultimately how the statistical software my students use works. My
journey took me into the arcane world of “random” numbers in
computers, and required me to understand how statistical software
generates so-called pseudo-random numbers. When I finally found an
answer to my puzzle, I learned that the problem was not with my
students, but with the software on which we all rely for research
and, increasingly, pedagogy. Many researchers today know there is no
such thing as a computer-generated truly random number. Although
many political scientists know this has profound consequences for
their work—whether for sampling purposes or Monte Carlo
experiments—to my knowledge instructors of quantitative methods
courses have given little thought to its implications in the
classroom. For one, it is easy and tempting to mistake the problems
of pseudo-random number generation for student malfeasance. For
another, it speaks to the students' (and instructor's) conceptual
grasp of the slippery idea of “randomness.” For this reason, I offer
my own experience as a cautionary tale.