‘ChatGPT Detector’ Catches AI-Generated Papers with Unprecedented Accuracy

A new tool based on machine learning uses features of writing style to distinguish between human and AI authors

A computer screen and keyboard with the home page of the artificial intelligence OpenAI web site, displaying its chatGPT robot. — A new AI detection tool can accurately identify chemistry papers written by ChatGPT.

Marco Bertorello/AFP via Getty Images

A machine-learning tool can easily spot when chemistry papers are written using the chatbot ChatGPT, according to a study published on 6 November in Cell Reports Physical Science. The specialized classifier, which outperformed two existing artificial intelligence (AI) detectors, could help academic publishers to identify papers created by AI text generators.

“Most of the field of text analysis wants a really general detector that will work on anything,” says co-author Heather Desaire, a chemist at the University of Kansas in Lawrence. But by making a tool that focuses on a particular type of paper, “we were really going after accuracy.”

The findings suggest that efforts to develop AI detectors could be boosted by tailoring software to specific types of writing, Desaire says. “If you can build something quickly and easily, then it’s not that hard to build something for different domains.”

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

The elements of style

Desaire and her colleagues first described their ChatGPT detector in June, when they applied it to Perspective articles from the journal Science. Using machine learning, the detector examines 20 features of writing style, including variation in sentence lengths, and the frequency of certain words and punctuation marks, to determine whether an academic scientist or ChatGPT wrote a piece of text. The findings show that “you could use a small set of features to get a high level of accuracy,” Desaire says.

In the latest study, the detector was trained on the introductory sections of papers from ten chemistry journals published by the American Chemical Society (ACS). The team chose the introduction because this section of a paper is fairly easy for ChatGPT to write if it has access to background literature, Desaire says. The researchers trained their tool on 100 published introductions to serve as human-written text, and then asked ChatGPT-3.5 to write 200 introductions in ACS journal style. For 100 of these, the tool was provided with the papers’ titles, and for the other 100, it was given their abstracts.

When tested on introductions written by people and those generated by AI from the same journals, the tool identified ChatGPT-3.5-written sections based on titles with 100% accuracy. For the ChatGPT-generated introductions based on abstracts, the accuracy was slightly lower, at 98%. The tool worked just as well with text written by ChatGPT-4, the latest version of the chatbot. By contrast, the AI detector ZeroGPT identified AI-written introductions with an accuracy of only about 35–65%, depending on the version of ChatGPT used and whether the introduction had been generated from the title or the abstract of the paper. A text-classifier tool produced by OpenAI, the maker of ChatGPT, also performed poorly — it was able to spot AI-written introductions with an accuracy of around 10–55%.

The new ChatGPT catcher even performed well with introductions from journals it wasn’t trained on, and it caught AI text that was created from a variety of prompts, including one aimed to confuse AI detectors. However, the system is highly specialized for scientific journal articles. When presented with real articles from university newspapers, it failed to recognize them as being written by humans.

Wider issues

What the authors are doing is “something fascinating,” says Debora Weber-Wulff, a computer scientist who studies academic plagiarism at the HTW Berlin University of Applied Sciences. Many existing tools try to determine authorship by searching for the predictive text patterns of AI-generated writing rather than by looking at features of writing style, she says. “I’d never thought of using stylometrics on ChatGPT.”

But Weber-Wulff points out that there are other issues driving the use of ChatGPT in academia. Many researchers are under pressure to quickly churn out papers, she notes, or they might not see the process of writing a paper as an important part of science. AI-detection tools will not address these issues, and should not be seen as “a magic software solution to a social problem.”

This article is reproduced with permission and was first published on January 27 2023.