For millennia, evolution has intrigued many great thinkers, prompting questions about how new traits emerge as species adapt over time. Then, attention shifted to natural selection and the inheritance of traits, leading to the discovery of genes—the hereditary units that underpin biological traits. Genetics soon became a cornerstone of evolutionary biology, offering insight into how variation within genes occurred. Still, key questions remained: How do entirely new genes arise, and how might they contribute to adaptive evolution?
Many assumed that these new genes arose from existing ones. Researchers believed that one of the known biological mechanisms drove this process: duplication, where an extra copy of a gene evolves into something new; chimerism, where fragments of different genes are stitched together like patchwork; or horizontal gene transfer, where genetic material leaps between one organism and another.
In the 1950s, plant biologist Stanley Stephens at North Carolina State College of Agriculture and Engineering (later North Carolina State University) suggested that new genetic loci could potentially emerge de novo.1 However, the technology of the time wasn’t advanced enough to prove it. Two decades later, biologist and Nobel laureate François Jacob at the Pasteur Institute, known for his work on gene regulation, particularly the discovery of the lac operon, brushed off the possibility. He described evolution as a tinkerer of existing parts and remarked that the probability of de novo gene evolution was practically zero.2 His view carried significant weight within the field: It wasn’t until the early 2000s that researchers could really begin to challenge this notion.
When evolutionary geneticist Li Zhao began her doctoral studies at the Chinese Academy of Sciences in 2006, she was highly interested in studying gene novelty and answering questions related to novel gene evolution—the process behind nature’s unique creatures. The birth of new genes with novel functions, such as those related to reproduction or growth, has been considered a major contributor to adaptive evolution. Traditionally, scientists believed these genes arose from modifications to existing coding DNA. Zhao pointed out that, “most scientists didn’t think that genes could originate de novo.”
But around the same time Zhao began her research, new evidence challenged this longstanding view with an alternative path. Population geneticist David Begun at the University of California, Davis (UC Davis) identified several de novo genes—genes originating from scratch, or non-coding DNA—in Drosophila melanogaster, the common fruit fly.3,4 Of the five genes, four occurred on the X chromosome and predominantly expressed in the testes, possibly under sexual selection pressures.
The discovery of these evolutionarily young genes birthed the emerging field of de novo origination, springing forth the five Ws: who, what, where, why, and when. While the work was initially met with believers and skeptics, Zhao belonged to the former. She recalled reading the paper and being so deeply inspired by his work. The potential of de novo genes struck a chord with her, sparking a passion that would steer her research in that direction.
Mapping De Novo Genes in Fruit Flies to Study Evolution
In 2011, Zhao was looking into options for her postdoctoral work; she sent a cold email to Begun. Following a Skype call with Begun, Zhao received an offer and moved to UC Davis in September that same year. “Li was immersed in this environment where understanding how new genes evolve was really the focus of the research,” explained Begun. “But no one had ever suggested that they could evolve out of nothing. Basically, it’s a pretty weird and exciting idea if it’s true. So, she definitely got excited about it.”
Previous studies of de novo gene evolution used comparative rather than population genomics approaches. The comparative genomics approach relies on finding lineage-specific genes unique to certain species and tracing their origins by comparing them to related genomes, where findings suggested that some de novo genes originated from ancestrally non-genic sequences. To better understand the process of how de novo genes originate within a population, Zhao homed in on D. melanogaster and its close relatives by using population genomics and transcriptomic data in one of her projects.5
By characterizing the transcriptomes of six previously sequenced D. melanogaster strains in the testes, Zhao and her colleagues uncovered potential de novo candidates. Of these, they identified 142 polymorphic (which segregated and evolved under selection) and 106 fixed (which remained consistent since the split from a common ancestor) de novo genes. Most of these candidates were regulated by cis elements, with expression driven by regulatory sequences just upstream of the new transcripts. The vast majority contained open reading frames (ORFs)—sections that could potentially produce proteins, marked by start and stop codons—of at least 150 base pairs. When comparing these sequences to ancestral genomes and non-expressing Drosophila strains, the same ORFs appeared, suggesting that the gene expression was driven primarily by regulatory changes.
Biology is more complex than what we imagine.
—Li Zhao, The Rockefeller University
Zhao and her colleagues proposed that these de novo genes may have undergone natural selection, as highly expressed genes were generally longer and more complex than those expressed at lower levels. However, whether these sequences were translated into proteins or served other functions remained unclear at the time. “Biology is more complex than what we imagine,” said Zhao.
As Zhao delved deeper into evolutionary questions, being surrounded by fellow evolutionary geneticists at UC Davis broadened her focus from just comparative to population genomics. This shift allowed her to explore nature from a new perspective. This included working on collaborative projects with evolutionary geneticist Nicolas Svetec, a fellow postdoctoral researcher in Begun’s lab. While Svetec focused on experimental work, Zhao brought her computational expertise to the table. Their shared passion for science eventually blossomed into both professional and personal connections, and they later married, continuing to complement each other’s work. In 2017, Zhao went on to establish her own research group at The Rockefeller University, diving headfirst into the mysteries of de novo genes with unyielding scientific curiosity.
“One of the things that’s good in a young field is that…there’s plenty of room for [early career researchers] to run because there aren’t that many big labs working on the problem,” said Begun.
Deciphering the Origin of De Novo Genes
When Zhao moved to The Rockefeller University, she noted that her building neighbors came from different scientific fields with a variety of techniques at their disposal—possible techniques that could help Zhao answer her scientific questions.
“Then, we didn’t really know how those kinds of genes are expressed, even though we knew that they tended to be tissue-specific or have tissue bias,” said Zhao. It wasn’t clear whether these genes were tightly regulated or if they were just noise. So, she turned to using single-cell RNA sequencing (scRNA-seq) to get a clearer picture.
However, many people did not believe it was possible to effectively apply scRNA-seq because of the cell shape during sperm development in the Drosophila testes. Under the microscope, Zhao remarked that, “there are basically a bunch of cells that are surrounded by the somatic cyst, and people didn’t even know if you could separate them.” Not only that, but while scRNA-seq could pick up highly expressed gene patterns, there were concerns that it would fail to pick up on lowly expressed genes. Zhao and her team pressed on undeterred and successfully found de novo genes at varying stages of sperm development.6
“These genes are tightly regulated, even when they are very young. They are not really expressed as noise,” said Zhao. When the team examined these de novo genes, they found complex expression patterns—some appeared only in specific cell types, while others were active much earlier, including in the stem cell stage. The most active window, however, was during the spermatocyte phase of sperm development. To make these findings widely accessible, the team curated their data into a searchable database that allows users to explore gene expression across Drosophila testes at single-cell resolution.
De novo genes, once thought to be rare, have been identified by the hundreds in fruit flies as well as other species.
© istock.com, nechaev-kon
Manyuan Long, an evolutionary biologist and geneticist at the University of Chicago, who has followed Zhao’s work over the years, remarked, “What I was most impressed [about recently] is that she developed…a single-cell RNA germline transcriptome database.” He highlighted the significance of this work, noting that the ability to monitor biological processes at the single-cell level represents a major advance for the field.
Long has studied how genes with novel functions originate since the early 1990s.7,8 He expressed enthusiasm for the new generation of scientists—like Zhao—who are pushing the boundaries of evolutionary genetics. He praised their willingness to challenge longstanding assumptions and to embrace diverse techniques and perspectives to address fundamental questions. Zhao described him as her “academic grandfather,” due to his UC Davis ties through evolutionary biologist Charles Langley, who oversaw his doctoral work and was also Begun’s postdoctoral advisor.
Li is definitely viewed as one of the movers and shakers [in the field].
—David Begun, University of California, Davis
Building on foundational insights from Long, Begun, and others, Zhao’s work now leverages cutting-edge tools like single-cell transcriptomics to probe where and how new genes are expressed. The testes remain a key site for studying de novo genes, likely driven by strong sexual selection pressures, with many young genes showing a clear bias toward expression in spermatocytes. While much of the research so far has centered on fly testes, scientists are beginning to expand the search for de novo genes in species beyond the common fruit fly: mammals, primates, humans, mice, yeast, and rice. What was once considered impossible was now found across different species.9-13 One study has identified 74 potential human de novo genes linked to brain development, but the landscape is far more complex.14 As Zhao noted, “The effective population size in fruit flies is very large, [where] natural selection is more efficient in acting on the genome in fruit flies than in humans. In primates, especially humans, the effective population size is very small.”
Filling the Gaps in Gene Regulation: De Novo Genes, Proteins, and Transcription Factors
Among many of the different research avenues that are prime for exploration, Zhao is interested in understanding how these young genes code for proteins. Many de novo genes have been proposed to be protein coding, and a few have been experimentally shown to yield protein products.
Zhao and her team took a mass spectrometry-first, ORF-focused computational approach to hunt for evidence of protein-coding potential in unannotated ORFs in Drosophila. The result: nearly 1,000 possible protein products encoded by previously overlooked regions of the genome. To strengthen their findings, they compared the results with ribosome profiling data, adding another layer of confidence.15
While some studies have suggested that proteins from de novo genes are highly disordered—and potentially neurotoxic—others propose they may be more structured and even conserved.16-18 Zhao wanted to dig deeper.
Evolutionary geneticist Li Zhao from The Rockefeller University studies de novo genes and their role in adaptive evolution.
Claire Holt courtesy of The Rockefeller University
“We wanted to understand how [these proteins] are folded. How can this fold become useful for this kind of regulatory network, because when they are folding, very often, they’re interacting with something, either a protein or RNA or some other molecule.” To investigate this, Zhao and her team applied different protein structure predictors such as AlphaFold2 and evolutionary scale modeling (ESMFold), as well as molecular dynamics simulations. Their analysis revealed that a number of de novo proteins appeared to be well folded and potentially functional.
Since founding her lab, Zhao has also been increasingly focused on the regulatory mechanisms that control the expression of these novel genes and proteins. How are they turned on? What transcription factors are involved? It’s a complex puzzle. Much like the 1950s speculation by Stephens that new technologies would someday enable de novo gene research, Zhao’s work brings together a wide array of tools, techniques, and perspectives to build a more complete picture.
She hopes this ongoing research will help illuminate the evolutionary history of gene origination. “It’s an important question, not just about biology, but about ourselves,” Zhao said. “There’s a possibility these genes may have implications for disease and other key areas of biology.” Working with a supportive community of scientists, she’s energized by the patience and persistence it takes to answer hard questions—and the excitement that comes from chasing them.
Stephens SG. Possible significance in duplication in evolution. Adv Genet. 1951;4:247-265.Jacob F. Evolution and tinkering. Science. 1977;196(4295):1161-1166. Levine MT, et al. Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. Proc Natl Acad Sci USA. 2006;103(26):9935-9939. Begun DJ, et al. Evidence for de novo evolution of testis-expressed genes in the Drosophila yakuba/Drosophila erecta clade. Genetics. 2007;176(2):1131-1137.Zhao L, et al. Origin and spread of de novo genes in Drosophila melanogaster populations. Science. 2014;343(6172):769-772.Witt E, et al. Testis single-cell RNA-seq reveals the dynamics of de novo gene transcription and germline mutational bias in Drosophila. eLife. 2019;8:e47138. Long M, Langley CH. Natural selection and the origin of jingwei, a chimeric processed functional gene in Drosophila. Science. 1993;260(5104):91-95.Zhang L, et al. Rapid evolution of protein diversity by de novo origination in Oryza. Nat Ecol Evol. 2019;3(4):679-690.Toll-Riera M, et al. Origin of primate orphan genes: A comparative genomics approach. Mol Biol Evol. 2009;26(3):603-612.Knowles DG, McLysaght A. Recent de novo origin of human protein-coding genes. Genome Res. 2009;19(10):1752-1759.Xie C, et al. A de novo evolved gene in the house mouse regulates female pregnancy cycles. eLife. 2019;8:e44392. Cai J, et al. De novo origination of a new protein-coding gene in Saccharomyces cerevisiae. Genetics. 2008;179(1):487-496.Xiao W, et al. A rice gene of de novo origin negatively regulates pathogen-induced defense response. PLoS One. 2009;4(2):e4603. An NA, et al. De novo genes with an lncRNA origin encode unique human brain developmental functionality. Nat Ecol Evol. 2023;7(2):264-278.Zheng EB, Zhao L. Protein evidence of unannotated ORFs in Drosophila reveals diversity in the evolution and properties of young proteins. eLife. 2022;11:e78772.Wilson BA, et al. Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat Ecol Evol. 2017;1(6):0146.Bungard D, et al. Foldability of a natural de novo evolved protein. Structure. 2017;25(11):1687-1696.e4.Lange A, et al. Structural and functional characterization of a putative de novo gene in Drosophila. Nat Commun. 2021;12(1):1667.