The data analysis center at Fred Hutch/UW are co-led by Sun and Li Hsu, PhD, a biostatistician who also works in Fred Hutch’s Public Health Sciences Division, and Ali Shojae, PhD, who is a professor of biostatistics and statistics at UW.
A new method to analyze genes in batches rather than one at a time
To understand how genes work together, Fred Hutch and the other data validation centers will use existing computer tools and possibly new ones they invent to study patterns of gene activity and how they regulate each other in key processes such as cell signaling, the cell cycle, metabolism, the immune system, and the development and specialization of cells.
“We want to not just understand the gene functions, but also how it connects with human disease,” Sun said.
The data validation centers may also develop computer models trained on MorPhiC data that find causal relationships between genes that can be generalized to other cell types or tissue microenvironments.
“The causal part is really interesting, and this is something people do not have in the traditional gene expression studies,” Sun said. “You have the expression of two genes, and you find that their expression goes up or down at the same time, but you do not know whether they are causally related.”
Fred Hutch and the other validation centers may develop new computer tools as well.
Sun and Zhexiao Lin, a former graduate student at UW, developed a deep learning method to find gene sets that can be used to distinguish cells by condition (for example, immune cells from patients with mild versus severe COVID-19).
Their method, explained in a recent preprint, helps analyze data produced by single-cell RNA sequencing, which measures gene expression in each cell individually to see how different types of cells behave.
The first release of data reported in the Nature study comprises 11 studies that knock out 71 genes or proteins using techniques that include single-cell RNA sequencing.
Typically, researchers probe that data to see which genes are expressed differently in different groups of cells. Then they look for biological processes that are especially common among the genes that stick out.
The goal is to find meaningful genes that can act like markers to classify which cell is which.
But trudging through the data gene by gene looking for meaningful differences produces long lists of genes that are statistically significant, but not useful because the differences are tiny and make little impact.
To narrow that list to meaningful genes takes time and money, making it impractical for a project as big as MorPhiC.
Sun’s method simplifies the process by identifying sets of genes that turn on and work together and then probing the data set by set instead of gene by gene.
With this approach, Sun’s team trained a computer model to accurately classify cells by analyzing batches of genes instead of individual genes.
They used the method to identify gene sets associated with severe COVID-19, dementia, and cancer patients’ responses to immunotherapy.
Former Fred Hutch senior staff scientist heads institute managing MorPhiC
The specific institute at NIH managing the MorPhiC project, the National Human Genome Research Institute, has gone through a significant shakeup since President Trump’s administration took over in January.
Visitors to MorPhiC’s web page are informed in a banner across the top: “Due to reduction in workforce efforts, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries.”
In March, the director who had served 16 years abruptly left, and his replacement was soon placed unexpectedly on administrative leave.
However, NIH announced in April that that Carolyn Hutter, PhD, had been named the new acting director of NHGRI. Hutter earned her doctorate in epidemiology at the University of Washington and was a senior staff scientist at Fred Hutch before joining NIH in 2012.
An NIH council meeting concerning the next phase of MorPhiC has been scheduled for September, which the participants hope will be, as Humphrey Bogart says at the end of Casablanca, “the beginning of a beautiful friendship” to continue the work.