Dishonest Science
Briefly, in Tuesday’s comments we discussed the irony of how an ethics professor had her tenure revoked (and will be fired) for her work on…wait for it…dishonesty. I commented briefly then, but felt it might make for a more interesting post. CynthiaW called out this New York Post story about it.
A renowned Harvard University professor was stripped of her tenure and fired after an investigation found she fabricated data on multiple studies focused on dishonesty.
Francesca Gino, a celebrated behavioral scientist at Harvard Business School, was let go after the school’s top governing board determined she tweaked observations in four studies so that their findings boosted her hypotheses, GBH News reported.
Harvard administrators notified business faculty that Gino was out of a job in a closed-door meeting this past week, the outlet reported.
How was it discovered? In a nutshell, data nerds (disclosure: I am an amateur data nerd. Deal with it). Most journals require you post your data files. Scholars at the web-site “Data Colada” studied Professor Gino’s data file, discovering discrepancies which greatly altered the results. In response, Professor Gino sued them. I think had she not sued, she might have been able to simply retract the paper without any repercussions, simply claiming there was an analysis error. But with the lawsuit, it kinda forced Harvard to investigate. Womp, Womp!
One of Professor Gino’s “big hits” asked an interesting question: does signing a form at the top improve honesty in responses, relative to signing at the bottom? IMHO I’d expect a modest difference, but with a big enough data set, I wouldn’t be surprised for it to be statistically significant. Instead the effect size was big. Big enough a major funder of research (47) might describe it as YUUUUUUUUGE. The number of cheaters dropped by more than half on one measure, and by more than 75% on another. IMHO that’s too large a finding, a potential red flag the journal editor overlooked.
How did Data Colada find it? They’re data nerds who know Excel better than I do. And I know it well enough that Microsoft modified Excel for Mac after I documented a flaw in it (long story). The data nerds found eight duplicated responses (faked) where all eight were in the direction of the proposed hypotheses. In a nutshell, those eight data changed the study’s outcomes from insignificant to significant. Data Colada looked at four of her papers, all since retracted, where they found similar problems in each study. In short, Professor Gino appears to have faked data in order to get results.
Unlike Man, not all data are created equal. A very strong finding for a given data point is more influential than one closer to the mean. Duplicating it increases its power. In my own work on event studies, one or two outliers either way can change the outcome of the overall event study. As a journal editor, it’s why I like to know the maximum and minimum results in an event study.
Most academic cheating is caught because it takes more effort to fake results than it does to create real ones. Many of us create our initial data files in Microsoft Office, which contain both a “created date” and a “last modified date”. Which means if you intend to fake your data, you must create the fake document the same time as the real document. Excel has an ability to be backsolved to the degree than manipulation can often be spotted. So to cheat correctly, create the real and fake files the same time. Do all the creative faking you need to do, then input manually the (now revised) raw data into the fake file, so they do not show manipulation. But that means you know up front you plan to cheat. My guess is the professor felt she just needed a little help to get the incredible results she wanted. So she didn’t plan to cheat, which is how she got caught cheating.
An easy solution to fake results is to simply discard problematic data. If it is only one or two lines (even eight would not have been a red flag), it is easy for data nerds to overlook it, almost impossible for them to verify what was deleted would have changed results.
But if you need to manipulate faked data, one solution is to voluntarily deposit the data as a text file, which removes most of the manipulation steps from the file. The benefit of doing so is text files can be read into multiple software programs for replication, but it obscures much of the data manipulation. Although as a scholar if they question your findings they’ll want to see the original files. But if they do you have time to change your soiled pants, then create a raw input data fakery file.
As for professor Gino, her academic career is over. She claimed on a later file a hacker broke in and changed results, such that what she deposited were the true files, while what they hacked were what Data Colada said were the original files. But I cannot see any reputable university hiring her now. My advice: Academia pays well. Maybe you don’t earn as much if you don’t cheat, but you make more than if you get fired for cheating.
Speaking of college, Daughter B phoned yesterday to tell us about the field work program she did in the Bend, OR, area over the weekend. It was a program for about 60 undergraduates to learn the skills they'll need if they're going to be researchers or practitioners in wildlife, forestry, and other kinds of outdoorsy sciences.
They had to set up a camp. (She already knew this stuff from Scouts.) They dissected some deer carcasses - hers died from malnutrition - and learned to take samples to test for Chronic Wasting Disease. They learned to change a tire and do basic maintenance on a truck. They set up trail cameras, did insect-population sampling, practiced for a program about listening for bat sounds, and several other things.
She was really excited about it. This semester has been mostly indoor classes - data studies, programming, more chemistry - but she said she'll have more fun things next term.
Very interesting comments, Jay. I appreciate your going into this in more detail. A phrase that jumped out at me was "Francesca Gino, a celebrated behavioral scientist".
Do you think the fact that Dr. Gino is a "celebrated" "scientist" is a significant factor in the decisions she made when the data nerds raised questions about her research?
What kind of study might be done to evaluate whether a person identifying as a "scientist" is more or less likely to treat accurate data collection as a matter of fiduciary responsibility, rather than in a more loosey-goosey, outcome-oriented sort of way? What about a person who is identified as "celebrated" or "celebrity" or otherwise outside the run-of-the-mill?
The discussion yesterday both here and at the Mothership reminded me of a podcast Jonah Goldberg did with a teacher/writer in the criminology field. The guest kept proudly self-identifying as a "scientist," to the extent that it really caught my attention as a listener. It seemed to me that he was more like a data researcher, which is a worthy occupation ... but is it the same as a SCIENTIST?