AI-text detection tools are really easy to fool
A recent crop of AI systems claiming to detect AI-generated text perform poorly—and it doesn’t take much to get past them.
Within weeks of ChatGPT’s launch, there were fears that students would be using the chatbot to spin up passable essays in seconds. In response to those fears, startups started making products that promise to spot whether text was written by a human or a machine.
The problem is that it’s relatively simple to trick these tools and avoid detection, according to new research that has not yet been peer reviewed.
Debora Weber-Wulff, a professor of media and computing at the University of Applied Sciences, HTW Berlin, worked with a group of researchers from a variety of universities to assess the ability of 14 tools, including Turnitin, GPT Zero, and Compilatio, to detect text written by OpenAI’s ChatGPT.
Most of these tools work by looking for hallmarks of AI-generated text, including repetition, and then calculating the likelihood that the text was generated by AI. But the team found that all those tested struggled to pick up ChatGPT-generated text that had been slightly rearranged by humans and obfuscated by a paraphrasing tool, suggesting that all students need to do is slightly adapt the essays the AI generates to get past the detectors.
“These tools don’t work,” says Weber-Wulff. “They don’t do what they say they do. They’re not detectors of AI.”
The researchers assessed the tools by writing short undergraduate-level essays on a variety of subjects, including civil engineering, computer science, economics, history, linguistics, and literature. They wrote the essays themselves to be certain the text wasn’t already online, which would have meant it might already have been used to train ChatGPT.
Then each researcher wrote an additional text in Bosnian, Czech, German, Latvian, Slovak, Spanish, or Swedish. Those texts were passed through either the AI translation tool DeepL or Google Translate to translate them into English.
The team then used ChatGPT to generate two additional texts each, which they slightly tweaked in an effort to hide that it’d been AI-generated. One set was edited manually by the researchers, who reordered sentences and exchanged words, while another was rewritten using an AI paraphrasing tool called Quillbot. In the end, they had 54 documents to test the detection tools on.
They found that while the tools were good at identifying text written by a human (with 96% accuracy, on average), they fared more poorly when it came to spotting AI-generated text, especially when it had been edited. Although the tools identified ChatGPT text with 74% accuracy, this fell to 42% when the ChatGPT-generated text had been tweaked slightly.
These kinds of studies also highlight how outdated universities’ current methods for assessing student work are, says Vitomir Kovanović, a senior lecturer who builds machine-learning and AI models at the University of South Australia, who was not involved in the project.
Daphne Ippolito, a senior research scientist at Google specializing in natural-language generation, who also did not work on the project, raises another concern.
“If automatic detection systems are to be employed in education settings, it is crucial to understand their rates of false positives, as incorrectly accusing a student of cheating can have dire consequences for their academic career,” she says. “The false-negative rate is also important, because if too many AI-generated texts pass as human written, the detection system is not useful.”
Compilatio, which makes one of the tools tested by the researchers, says it is important to remember that its system just indicates suspect passages, which it classifies as potential plagiarism or content potentially generated by AI.
“It is up to the schools and teachers who mark the documents analyzed to validate or impute the knowledge actually acquired by the author of the document, for example by putting in place additional means of investigation—oral questioning, additional questions in a controlled classroom environment, etc.,” a Compilatio spokesperson said.
“In this way, Compilatio tools are part of a genuine teaching approach that encourages learning about good research, writing, and citation practices. Compilatio software is a correction aid, not a corrector,” the spokesperson added. GPT Zero did not immediately respond to a request for comment.
"Our detection model is based on the notable differences between the more idiosyncratic, unpredictable nature of human writing and the very predictable statistical signatures of AI generated text," Annie Chechitelli, Turnitin's chief product officer, says.
"However, our AI writing detection feature simply alerts the user to the presence of AI writing, highlighting areas where further discussion may be necessary. It does not determine the appropriate or inappropriate use of AI writing tools, or whether that use constitutes cheating or misconduct based on the assessment and the instruction provided by the teacher."
We’ve known for some time that tools meant to detect AI-written text don’t always work the way they’re supposed to. Earlier this year, OpenAI unveiled a tool designed to detect text produced by ChatGPT, admitting that it flagged only 26% of AI-written text as “likely AI-written.” OpenAI pointed MIT Technology Review towards a section on its website for educator considerations, which warns that tools designed to detect AI-generated content are “far from foolproof.”
However, such failures haven’t stopped companies from rushing out products that promise to do the job, says Tom Goldstein, an assistant professor at the University of Maryland, who was not involved in the research.
“Many of them are not highly accurate, but they are not all a complete disaster either,” he adds, pointing out that Turnitin managed to achieve some detection accuracy with a fairly low false-positive rate. And while studies that shine a light on the shortcomings of so-called AI-text detection systems are very important, it would have been helpful to expand the study’s remit to AI tools beyond ChatGPT, says Sasha Luccioni, a researcher at AI startup Hugging Face.
For Kovanović, the whole idea of trying to spot AI-written text is flawed.
“Don’t try to detect AI—make it so that the use of AI is not the problem,” he says.
Update: this story has been updated to include comments from Turnitin received post-publication.
Deep Dive
Artificial intelligence
Large language models can do jaw-dropping things. But nobody knows exactly why.
And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.
OpenAI teases an amazing new generative video model called Sora
The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.
Google’s Gemini is now in everything. Here’s how you can try it out.
Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.
Providing the right products at the right time with machine learning
Amid shifting customer needs, CPG enterprises look to machine learning to bolster their data strategy, says global head of MLOps and platforms at Kraft Heinz Company, Jorge Balestra.
Stay connected
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.