When Research Becomes a Business: How Fake Articles Are Contaminating Cancer Science

Feb 10
4 min read

A study has revealed that cancer science is increasingly being flooded with fake scientific articles, produced by organizations that fabricate research for money. Using computational tools to analyze titles and abstracts, researchers identified that almost 10% of cancer articles bear similarities to fraudulent studies. The problem is growing over time and even affects highly prestigious scientific journals, highlighting the urgency of actions to protect the integrity of science.

In recent years, scientists have been sounding the alarm about a serious problem that is silently growing within science, especially in the area of cancer: the publication of fake research. This research is not just innocent errors, but often forms part of an organized scheme known as "scientific article factories."

These factories operate like clandestine companies that produce scientific articles on demand for people who want to publish quickly, gain academic prestige, or advance their careers, even without having done the actual research.

These organizations write entire texts, invent data, create fake graphs, and even manipulate images of experiments. In many cases, the authors listed in these articles never worked together or even participated in the research.

The main objective of these "factories" is profit: the more articles they produce, the more money they earn. It is estimated that hundreds of thousands of suspicious articles have been published in the last two decades, generating millions of dollars for this illegal market.

To produce so many articles in a short time, these factories use ready-made text templates, changing only a few technical words to make the papers look different. The result is generic, poorly written articles, with parts that don't make much sense together.

Often, the methods described wouldn't work in practice, the data cannot be reproduced, and the images appear repeated in different studies. Even so, these papers manage to get through scientific journals.

One of the reasons for this is the enormous pressure on researchers to publish more and more. In some areas, such as cancer research, this pressure is even greater. Furthermore, many studies in this area use complex techniques that are difficult to verify, which facilitates falsification. The peer-review system, which should filter out these problems, cannot always handle the volume of submitted articles.

Another worrying factor is the advancement of artificial intelligence, which has made it even easier to generate scientific texts that appear legitimate at first glance. Today, it is possible to create fake texts and images automatically, which makes identifying fraud even more difficult.

To try to contain this problem, some publishers have started using automatic screening tools, and experts in scientific integrity have developed methods to detect suspicious patterns, such as strange phrases, scientific terms used incorrectly, or missing ethical information.

Previous research has shown that many of these fake articles follow very similar patterns: almost identical structure, repeated phrases, little varied vocabulary, and superficial explanations.

Recent studies have shown that computers can be trained to recognize these patterns by analyzing only the text of articles, especially titles and abstracts, which are usually publicly available.

In this study, researchers trained a computer system to recognize common characteristics in demonstrably fraudulent articles. Instead of analyzing entire studies, the system examined only titles and abstracts, which are easier to access. The idea was to check if new articles presented similarities to research that had already been retracted due to fraud.

When the system was applied to millions of cancer articles published over more than twenty years, the results were worrying. Almost 10% of the studies analyzed were flagged as suspicious, and this number grew over time. Most alarmingly, these articles also appeared in scientific journals considered highly prestigious, not just in smaller publications.

The study authors emphasize that these results do not constitute direct accusations, but serve as a warning. The growth of fraudulent research can harm treatments, delay important discoveries, and undermine trust in science. Therefore, they advocate for collective action, with more editorial care, less pressure for productivity, and greater awareness to protect the integrity of cancer research.

Machine learning based screening of potential paper mill publications in cancer research: methodological and cross sectional study

Baptiste Scancar, Jennifer A Byrne, David Causeur, and Adrian G Barnett

BMJ 2026;392:e087581

doi.org/10.1136/bmj-2025-087581

Abstract:

To train and validate a machine learning model to distinguish paper mill publications from genuine cancer research articles, and to screen the cancer research literature to assess the prevalence of papers that have textual similarities to paper mill papers. Methodological and cross sectional study applying a BERT (bidirectional encoder representations from transformers) based, text classification model to article titles and abstracts. Retracted paper mill publications listed in the Retraction Watch database were used for model training. The cancer research corpus was screened by the model using the PubMed database restricted to original cancer research articles published between 1999 and 2024. The model was trained on 2202 retracted paper mill papers and validated on independent data collected by image integrity experts. 2.6 million cancer research papers were screened. Prevalence of papers flagged as similar to retracted paper mill publications with 95% confidence intervals and their distribution over time, by country, publisher, cancer type, research area, and within high impact journals (top 10%). The model achieved an accuracy of 0.91. When applied to the cancer research literature, it flagged 261 245 of 2 647 471 papers (9.87%, 95% confidence interval 9.83 to 9.90) and revealed a large increase in flagged papers from 1999 to 2024, both across the entire corpus and in the top 10% of journals by impact factor. More than 170 000 papers affiliated with Chinese institutions were flagged, accounting for 36% of Chinese cancer research articles. Most publishers had published substantial numbers of flagged papers. Flagged papers were overrepresented in fundamental research and in gastric, bone, and liver cancer. Paper mills are a large and growing problem in the cancer literature and are not restricted to low impact journals. Collective awareness and action will be crucial to address the problem of paper mill publications.

MOL.

When Research Becomes a Business: How Fake Articles Are Contaminating Cancer Science

Recent Posts

Comments