AI

Datasets used to train AI algorithms may underrepresent older people. Credit: Pixabay/CC0 Public Domain

An AI-powered tool from Carnegie Mellon University and collaborators is helping uncover genetic clues to rare diseases, potentially accelerating diagnoses and treatments for conditions that affect only a fraction of the population.

Researchers typically need data from tens of thousands of patients to study how genetic variants relate to diseases. For rare conditions, those affecting fewer than 0.01% of the population, such large datasets are much harder to gather.

To address this challenge, a team including researchers from CMU’s School of Computer Science have developed KGWAS, a deep-learning method that enhances traditional genome-wide association studies. By integrating vast amounts of functional genomics data, KGWAS improves the ability to detect genetic links in small patient cohorts, enabling faster discoveries for rare diseases and potentially enabling the discovery of new drugs or treatments.

What is GWAS?

The new method builds on traditional genome-wide association studies, GWAS, which stands for genome-wide association study. It’s a method of scanning the genome of large groups of people to identify certain genetic variants that are associated with certain diseases or other traits.

“GWAS is vital to the entire ecosystem for drug discovery,” said Martin Zhang, an assistant professor in SCS’s Ray and Stephanie Lane Computational Biology Department. “By design, this works by collecting the genetic information for a bunch of people, and then you correlate the genetic mutations with the disease status.

“But, by definition, you need to see a lot of people with the disease in order to do the correlation. If you only see one person with the disease, then the correlation is going to be very low, and you don’t have a lot of statistical power to detect the associations faithfully. For rare diseases, where only 0.1% or even .01% of people in the population have it, those are cases where GWAS is fundamentally limited.”

When doing GWAS analyses, there might be genetic information from 100,000 to a million people, and in that, if about 10,000 people have a certain trait or disease, then a researcher can make a correlation between a mutation and that disease. But, for rarer diseases, that number could be somewhere between 300 to 1,000.

Researchers can go to hospitals and specifically seek out people with certain diseases, which are called case-controlled studies. This works well for some diseases, like Alzheimer’s disease, where there are more resources and people available to study. But it’s much harder for rare diseases, like myasthenia gravis, a rare autoimmune disorder that causes weakness in the muscles that control bodily function and movement. There just aren’t enough patients in one place, and it takes a lot of time and effort to find them.

What is KGWAS?

In this study, available on the medRxiv preprint server, researchers developed a method called Knowledge Graph GWAS (KGWAS), which combines a variety of genetic information to make associations between gene variants and specific traits for rarer diseases. The knowledge graph takes information from GWAS and combines it with comprehensive functional genomics data, which is information about a gene’s function and interaction.

“There are so many different technologies to measure the same thing,” said Kexin Huang, a doctoral student at Stanford University’s Computer Science Department. “And all of these different measurements capture some part of the biology of the gene.

“Since we wanted to improve the power of GWAS, we decided to bring in as much information as possible to the process. So, we needed a framework to unify all these different measurement technologies. The knowledge graph is a very natural way to just bridge everything together.”

In this work, the knowledge graph links the functions and interactions between genetic variants, genes and gene programs, which are predefined groups of genes with shared functions. This KGWAS knowledge graph is one of the largest to date, with 11 million links between genetic variants, genes and gene programs.

Then, for a given disease, KGWAS trains an AI model to use the knowledge graph to predict the likelihood or strength of an association of each genetic variant to that given disease based on aggregate GWAS evidence. Along with predicting associations, this method also cuts through the noise in data, making improvements when distinguishing actual disease-associated variants from false ones.

When applied to a rare disease with limited data, KGWAS can be used to make better predictions of what genetic variants are linked to certain diseases. Researchers found that KGWAS identified up to 100% more statistically significant associations than state-of-the-art GWAS methods, or achieved the same detection power with about 2.7 times fewer samples.

“KGWAS’s applications are pretty diverse, ranging from helping in rare disease diagnosis to drug discovery,” said Huang. “On the more technical side, it’s also a change to the fundamental algorithm of human genetics (GWAS). By making a better GWAS, we can unlock a variety of different downstream tasks. For rare diseases, the KGWAS method has the potential to make real improvements.”

When researchers are better able to make these connections between genetic variants and certain diseases, more targeted treatment applications could be developed.

“With KGWAS, we are trying to put everything together,” Zhang said. “It’s like a framework that can automatically transform the functional data we have into discoveries.”

More information:
Kexin Huang et al, Small-cohort GWAS discovery with AI over massive functional genomics knowledge graph, medRxiv (2024). DOI: 10.1101/2024.12.03.24318375

Provided by
Carnegie Mellon University

Citation:
Finding answers faster: AI method brings hope to rare disease research (2025, August 5)
retrieved 5 August 2025
from https://medicalxpress.com/news/2025-08-faster-ai-method-rare-disease.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.