
In this study, researchers developed a groundbreaking new method called the Genetic Progression Score (GPS) to predict how autoimmune diseases such as rheumatoid arthritis and systemic lupus erythematosus may progress from their early stages, before full symptoms appear. Not only did GPS outperform other methods in terms of accuracy, it also provided more relevant information about the underlying causes of these diseases.
Autoimmune diseases often have a preclinical phase, in which the first signs, such as the presence of autoantibodies, appear before the full development of the disease.
For example, in patients with rheumatoid arthritis (RA), autoantibodies such as anti-citrullinated protein antibodies and rheumatoid factor (RF) can be detected up to five years before clinical symptoms.
Symptoms such as joint pain and swelling are also observed during this early phase. Similarly, individuals who develop systemic lupus erythematosus (SLE) may develop antinuclear antibodies (ANA) and other antibodies, such as antiphospholipid, anti-Ro, and anti-La, during the preclinical phase.
Several autoimmune diseases can affect the brain, causing a wide range of neurological symptoms. Multiple sclerosis attacks myelin, leading to fatigue, vision problems, and motor difficulties.
Neuromyelitis optica affects the optic nerve and spinal cord, causing vision loss and weakness. Lupus can cause headaches, seizures, and cognitive problems. Guillain-Barré syndrome causes progressive muscle weakness and, in severe cases, breathing problems.
Autoimmune encephalitis and Hashimoto's encephalopathy can lead to seizures, personality changes, and cognitive difficulties. Antiphospholipid syndrome can cause blood clots in the brain, resulting in strokes and memory loss. Treatments include immunosuppressants, corticosteroids, and symptomatic care.

Not all individuals with preclinical signs will progress to full-blown disease; some will remain in a stable phase or remit without developing significant symptoms.
Therefore, identifying biomarkers that can predict disease progression from the preclinical phase is essential for early interventions that can reduce symptoms and improve quality of life.
Biobanks using electronic health records (EHRs) provide a rich database of genetic information, laboratory tests, and clinical diagnoses, and are useful for identifying individuals at risk of disease progression.
Germline genetics, because they are constant throughout life, offer a valuable resource for early diagnosis. Previous studies have shown that combining genetic risk scores (PRS) for systemic lupus erythematosus with autoantibody tests, such as ANA and anti-dsDNA, improves diagnostic accuracy and helps stratify patients for risk of disease progression.
The progression of autoimmune diseases, such as systemic lupus erythematosus, can be influenced by genetic factors that are also studied in analyses comparing people with and without the disease, known as case-control studies.
However, genetic risk scores, which are calculations based on these studies to predict the likelihood of someone developing the disease, may not be accurate enough to predict whether a person who is in the early (preclinical) stage of the disease will actually progress to the full-blown form.
To improve this prediction, researchers suggest using data from biobanks that contain information from electronic health records, combined with large case-control studies. Biobanks have fewer cases of disease and less data from people in the preclinical stage, which can limit the accuracy of predictions made with these data alone.
However, when these data are combined with the larger, more detailed case-control studies, the accuracy in predicting who will progress to the full-blown form of the disease can improve significantly.

There are several methods for combining data, including cross-trait meta-analysis, which seeks more accurate estimates of genetic effects; transfer learning, which adjusts models based on case-control studies to better predict disease progression; and weighted combinations of different genetic risk models.
While all of these methods show promise, they face challenges, such as not being flexible enough to handle different genetic patterns or not working well in certain scenarios when integrating data from case-control studies with biobanks.
Researchers at Penn State University have created a novel method called Genetic Progression Scoring (GPS) to predict how autoimmune diseases such as rheumatoid arthritis and systemic lupus erythematosus may progress from their early stages, before full symptoms appear.
This method combines genetic data related to case-control traits (i.e., comparisons between people with and without the disease) with a special technique that adjusts the calculations to improve the accuracy of the prediction.
This means that GPS can consider both genetic similarities and differences between those who already have the disease and those who are only in the preclinical phase.

Tests and simulations of GPS have shown that it is as accurate or more accurate than other methods currently in use.
It particularly excels in situations where there is little genetic similarity between controls and individuals with early-stage disease, or when there is limited data available in biobanks.
In practical applications, such as studies conducted at the Vanderbilt University biobank, GPS has been shown to be effective in predicting whether people with early signs of rheumatoid arthritis or lupus will progress to more severe forms of the disease.
This has been validated by another large biobank, called All of Us. Not only did GPS outperform other methods in terms of accuracy, it also provided more relevant information about the underlying causes of these diseases.
People with high GPS scores have a significantly higher risk of seeing their conditions worsen and therefore can be monitored more closely or receive earlier treatments to avoid complications.
READ MORE:
Integrating electronic health records and GWAS summary statistics to predict the progression of autoimmune diseases from preclinical stages
Chen Wang, Havell Markus, Avantika R. Diwadkar, Chachrit Khunsriraksakul, Laura Carrel, Bingshan Li, Xue Zhong, Xingyan Wang, Xiaowei Zhan, Galen T. Foulke, Nancy J. Olsen, Dajiang J. Liu & Bibo Jiang
Nature Communications, volume 16, Article number: 180 (2025)
Abstract:
Autoimmune diseases often exhibit a preclinical stage before diagnosis. Electronic health record (EHR) based-biobanks contain genetic data and diagnostic information, which can identify preclinical individuals at risk for progression. Biobanks typically have small numbers of cases, which are not sufficient to construct accurate polygenic risk scores (PRS). Importantly, progression and case-control phenotypes may have shared genetic basis, which we can exploit to improve prediction accuracy. We propose a novel method Genetic Progression Score (GPS) that integrates biobank and case-control study to predict the disease progression risk. Via penalized regression, GPS incorporates PRS weights for case-control studies as prior and forces model parameters to be similar to the prior if the prior improves prediction accuracy. In simulations, GPS consistently yields better prediction accuracy than alternative strategies relying on biobank or case-control samples only and those combining biobank and case-control samples. The improvement is particularly evident when biobank sample is smaller or the genetic correlation is lower. We derive PRS for the progression from preclinical rheumatoid arthritis and systemic lupus erythematosus in the BioVU biobank and validate them in All of Us. For both diseases, GPS achieves the highest prediction and the resulting PRS yields the strongest correlation with progression prevalence.
Comments