MIT Technology Review Subscribe

AI-text detection tools are really easy to fool

A recent crop of AI systems claiming to detect AI-generated text perform poorly—and it doesn’t take much to get past them.

Within weeks of ChatGPT’s launch, there were fears that students would be using the chatbot to spin up passable essays in seconds. In response to those fears, startups started making products that promise to spot whether text was written by a human or a machine. 

The problem is that it’s relatively simple to trick these tools and avoid detection, according to new research that has not yet been peer reviewed. 

Advertisement

Debora Weber-Wulff, a professor of media and computing at the University of Applied Sciences, HTW Berlin, worked with a group of researchers from a variety of universities to assess the ability of 14 tools, including Turnitin, GPT Zero, and Compilatio, to detect text written by OpenAI’s ChatGPT. 

This story is only available to subscribers.

Don’t settle for half the story.
Get paywall-free access to technology news for the here and now.

Subscribe now Already a subscriber? Sign in
You’ve read all your free stories.

MIT Technology Review provides an intelligent and independent filter for the flood of information about technology.

Subscribe now Already a subscriber? Sign in

Most of these tools work by looking for hallmarks of AI-generated text, including repetition, and then calculating the likelihood that the text was generated by AI. But the team found that all those tested struggled to pick up ChatGPT-generated text that had been slightly rearranged by humans and obfuscated by a paraphrasing tool, suggesting that all students need to do is slightly adapt the essays the AI generates to get past the detectors. 

“These tools don’t work,” says Weber-Wulff. “They don’t do what they say they do. They’re not detectors of AI.”

The researchers assessed the tools by writing short undergraduate-level essays on a variety of subjects, including civil engineering, computer science, economics, history, linguistics, and literature. They wrote the essays themselves to be certain the text wasn’t already online, which would have meant it might already have been used to train ChatGPT.

Then each researcher wrote an additional text in Bosnian, Czech, German, Latvian, Slovak, Spanish, or Swedish. Those texts were passed through either the AI translation tool DeepL or Google Translate to translate them into English. 

The team then used ChatGPT to generate two additional texts each, which they slightly tweaked in an effort to hide that it’d been AI-generated. One set was edited manually by the researchers, who reordered sentences and exchanged words, while another was rewritten using an AI paraphrasing tool called Quillbot. In the end, they had 54 documents to test the detection tools on.

They found that while the tools were good at identifying text written by a human (with 96% accuracy, on average), they fared more poorly when it came to spotting AI-generated text, especially when it had been edited. Although the tools identified ChatGPT text with 74% accuracy, this fell to 42% when the ChatGPT-generated text had been tweaked slightly.

These kinds of studies also highlight how outdated universities’ current methods for assessing student work are, says Vitomir Kovanović, a senior lecturer who builds machine-learning and AI models at the University of South Australia, who was not involved in the project.

Advertisement

Daphne Ippolito, a senior research scientist at Google specializing in natural-language generation, who also did not work on the project, raises another concern.

“If automatic detection systems are to be employed in education settings, it is crucial to understand their rates of false positives, as incorrectly accusing a student of cheating can have dire consequences for their academic career,” she says. “The false-negative rate is also important, because if too many AI-generated texts pass as human written, the detection system is not useful.” 

Compilatio, which makes one of the tools tested by the researchers, says it is important to remember that its system just indicates suspect passages, which it classifies as potential plagiarism or content potentially generated by AI.

“It is up to the schools and teachers who mark the documents analyzed to validate or impute the knowledge actually acquired by the author of the document, for example by putting in place additional means of investigation—oral questioning, additional questions in a controlled classroom environment, etc.,” a Compilatio spokesperson said.

“In this way, Compilatio tools are part of a genuine teaching approach that encourages learning about good research, writing, and citation practices. Compilatio software is a correction aid, not a corrector,” the spokesperson added. GPT Zero did not immediately respond to a request for comment.

“Our detection model is based on the notable differences between the more idiosyncratic, unpredictable nature of human writing and the very predictable statistical signatures of AI generated text,” Annie Chechitelli, Turnitin’s chief product officer, says.

“However, our AI writing detection feature simply alerts the user to the presence of AI writing, highlighting areas where further discussion may be necessary. It does not determine the appropriate or inappropriate use of AI writing tools, or whether that use constitutes cheating or misconduct based on the assessment and the instruction provided by the teacher.”

We’ve known for some time that tools meant to detect AI-written text don’t always work the way they’re supposed to. Earlier this year, OpenAI unveiled a tool designed to detect text produced by ChatGPT, admitting that it flagged only 26% of AI-written text as “likely AI-written.” OpenAI pointed MIT Technology Review towards a section on its website for educator considerations, which warns that tools designed to detect AI-generated content are “far from foolproof.”

Advertisement

However, such failures haven’t stopped companies from rushing out products that promise to do the job, says Tom Goldstein, an assistant professor at the University of Maryland, who was not involved in the research. 

“Many of them are not highly accurate, but they are not all a complete disaster either,” he adds, pointing out that Turnitin managed to achieve some detection accuracy with a fairly low false-positive rate. And while studies that shine a light on the shortcomings of so-called AI-text detection systems are very important, it would have been helpful to expand the study’s remit to AI tools beyond ChatGPT, says Sasha Luccioni, a researcher at AI startup Hugging Face.

For Kovanović, the whole idea of trying to spot AI-written text is flawed.

“Don’t try to detect AI—make it so that the use of AI is not the problem,” he says.

Update: this story has been updated to include comments from Turnitin received post-publication.

This is your last free story.
Sign in Subscribe now

Your daily newsletter about what’s up in emerging technology from MIT Technology Review.

Please, enter a valid email.
Privacy Policy
Submitting...
There was an error submitting the request.
Thanks for signing up!

Our most popular stories

Advertisement