A group of researchers has developed a novel method to identify text generated by large language models (LLMs). They have discovered that the use of certain “excess words” increased significantly during the LLM era, specifically in 2023 and 2024. This suggests that at least 10% of abstracts in 2024 were processed with LLMs.
The researchers were inspired by studies that measured the impact of the COVID-19 pandemic by looking at excess deaths compared to the recent past. They applied a similar approach to “excess word usage” after LLM writing tools became widely available in late 2022. They found an abrupt increase in the frequency of certain style words, which was unprecedented in both quality and quantity.
To measure these vocabulary changes, the researchers analyzed 14 million paper abstracts published on PubMed between 2010 and 2024. They tracked the relative frequency of each word as it appeared across each year and compared the expected frequency of those words (based on the pre-2023 trendline) to the actual frequency of those words in abstracts from 2023 and 2024.
The results found a number of words that were extremely uncommon in these scientific abstracts before 2023 that suddenly surged in popularity after LLMs were introduced. For instance, the word “delves” shows up in 25 times as many 2024 papers as the pre-LLM trend would expect; words like “showcasing” and “underscores” increased in usage by nine times as well. Other previously common words became notably more common in post-LLM abstracts: the frequency of “potential” increased 4.1 percentage points; “findings” by 2.7 percentage points; and “crucial” by 2.6 percentage points.
These kinds of changes in word use could happen independently of LLM usage, of course—the natural evolution of language means words sometimes go in and out of style. However, the researchers found that, in the pre-LLM era, such massive and sudden year-over-year increases were only seen for words related to major world health events. This suggests that the sudden surge in certain words’ popularity is indeed linked to the widespread use of LLMs. The researchers speculate that future large language models might analyze frequency themselves, lowering the weight of marker words to better mask their outputs as human-like.
Read more: arstechnica.com