Credit: Pexels

For most research studies and clinical trials, securing a large sample size is a perennial challenge. A huge amount of time and resource goes into recruiting willing participants to ensure an adequate data set. 

While a typical trial may use hundreds of thousands of samples, reaching these numbers is often unfeasible for rare diseases, where eligible participants are few and far between.

“We don’t have a good way of identifying genes associated with rarer disorders because they are so rare and the sample numbers small,” says Dr Maryam Shoai, a Senior Research Fellow in Professor Sir John Hardy’s lab at the UCL Queen Square Institute of Neurology. 

Dr Shoai specialises in statistical modelling with a focus on genetic and clinical data in neurodegenerative disorders. For the past 18 months, Dr Shoai’s team, in collaboration with University of Surrey’s Nature Inspired Computing and Engineering Research Group has been using machine learning to determine whether it’s possible to reduce sample numbers while still achieving results comparable to traditional methods of genome-wide association studies (GWAS). 

GWAS analysis scans the genome of many individuals to identify genetic variations associated with specific diseases or traits. These studies typically require thousands of samples, over a million participants were used in the latest GWAS of Alzheimer’s disease.

Big discoveries with smaller samples using AI

To explore whether it is possible to reduce sample sizes and still retain power to detect effects, the team use logic programming, a subset of artificial intelligence. Unlike traditional AI approaches that rely on vast datasets and statistical models, logic programming uses known rules and relationships to identify genetic patterns.

Primary analysis on pilot data suggests that even with as few as 250 samples, the genetic regions identified for Alzheimer’s disease are similar to a standard GWAS, which typically use thousands of cases and controls.  

“This is huge, because it means that it could be used for rarer dementias or disorders. At the moment we struggle to do GWAS for rarer diseases as the sample numbers are often lacking.”

Recently, Dr Shoai and her team investigated Pick’s disease. Pick’s disease is a rare form of Frontotemporal dementia, which can only be definitively diagnosed post-mortem by examining brain tissue. Despite using almost all the brains donated to research with confirmed Pick’s disease, the numbers only reached 300-400, which is classed as a small sample size for GWAS.

This highlighted the need to explore other methods to investigate the genetics of Pick’s disease. 

“Using this logic programming method seems to show that standard statistical tests like genome-wide association studies could be replaced with more complex methodologies and get similar answers for smaller data sets.”

Methods like logic programming could also be revolutionary for countries where genotyping thousands of people and obtaining good quality phenotyping data, or tracking them for a long period of time for longitudinal studies can be difficult, be it due to economic issues or geographical hinderance. Even for a disease as common as Alzheimer’s disease, data sets can still be small.

“It’s something that’s always on our minds” says Dr Maryam Shoai. “We don’t have the background knowledge of what most neurodegenerative diseases look like in most non-Caucasian populations, and we need to bridge this gap to enable globally suitable therapies.”

Dr Maryam Shoai

Collaboration is key to developing our understanding of neurogenetics

Dr Shoai believes that the possibilities to expand our knowledge of genetics, using artificial intelligence are vast, but collaboration and an interdisciplinary approach is crucial: “The real test is marrying the knowledge between artificial intelligence models with the expert information we have from genetic, clinical, and disease biology.”

“If we get enough people trained in both fields simultaneously, it could propel this area exponentially over the next few years.”

This collaborative ethos is at the heart of UCL’s new neuroscience centre. Due to open in 2027, the building makes it easier for researchers working in different areas and in different ways to cross paths.  

The impact of this research for people who are living with neurodegenerative diseases could be transformative. 

The future of AI and neurodegenerative diseases

By harnessing machine-learning, researchers will be able to quickly recognise patterns in data. This opens doors to improved and more personalised neurodegenerative diagnoses, allowing researchers to understand how someone’s genetic background can affect the cause and course of the disease.

Predictive models could also help enable earlier and alternative treatments for people with Alzheimer’s disease.

“Pharmaceutical clinical trials in Alzheimer’s disease predominantly focus on the removal of amyloid or tau proteins in the brain, the hallmarks of Alzheimer’s disease. While some of these have demonstrated significant promise, questions arise regarding cohorts where the disease presentation deviates from typical Alzheimer’s. This often pertains to rarer forms of Alzheimer’s disease or populations with genetic profiles markedly distinct from the Western and Northern European cohorts studied to date. This is an opportunity for unconventional methods to influence the future of therapeutic target discovery for globally relevant and efficient treatment.”

Related