Blommaert J. Genome size evolution: towards new model systems for old questions. Proc Biol Sci. 2020;287(1933): 20201441.
Elliott TA, Gregory TR. What’s in a genome? The c-value enigma and the evolution of eukaryotic genome content. Philos Trans R Soc Lond B Biol Sci. 2015;370(1678): 20140331.
Lefebure T, Morvan C, Malard F, Francois C, Konecny-Dupre L, Gueguen L, Weiss-Gayet M, Seguin-Orlando A, Ermini L, Sarkissian C, et al. Less effective selection leads to larger genomes. Genome Res. 2017;27(6):1016–28.
Gregory TR. Synergy between sequence and size in large-scale genomics. Nat Rev Genet. 2005;6(9):699–708.
Pellicer J, Hidalgo O, Dodsworth S, Leitch IJ. Genome size diversity and its impact on the evolution of land plants. Genes. 2018;9(2):88. https://doi.org/10.3390/genes9020088
Hidalgo O, Pellicer J, Christenhusz M, Schneider H, Leitch AR, Leitch IJ. Is there an upper limit to genome size? Trends Plant Sci. 2017;22(7):567–73.
Yin D, Schwarz EM, Thomas CG, Felde RL, Korf IF, Cutter AD, Schartner CM, Ralston EJ, Meyer BJ, Haag ES. Rapid genome shrinkage in a self-fertile nematode reveals sperm competition proteins. Science. 2018;359(6371):55–61.
Adams PE, Eggers VK, Millwood JD, Sutton JM, Pienaar J, Fierst JL. Genome size changes by duplication, divergence, and insertion in caenorhabditis worms. Mol Biol Evol. 2023;40(3):msad039. https://doi.org/10.1093/molbev/msad039
Vitales D, Álvarez I, Garcia S, Hidalgo O, Nieto Feliner G, Pellicer J, Vallès J, Garnatje T. Genome size variation at constant chromosome number is not correlated with repetitive DNA dynamism in anacyclus (Asteraceae). Ann Bot. 2019;125(4):611–23.
Agudo AB, Torices R, Loureiro J, Castro S, Castro M, Alvarez I. Genome size variation in a hybridizing diploid species complex in (Asteraceae: Anthemideae). Int J Plant Sci. 2019;180(5):374–85.
Stein JC, Yu Y, Copetti D, Zwickl DJ, Zhang L, Zhang C, Chougule K, Gao D, Iwata A, Goicoechea JL, et al. Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza. Nat Genet. 2018;50(2):285–96.
Bozan I, Achakkagari SR, Anglin NL, Ellis D, Tai HH, Stromvik MV. Pangenome analyses reveal impact of transposable elements and ploidy on the evolution of potato species. Proc Natl Acad Sci U S A. 2023;120(31): e2211117120.
Kress WJ, Soltis DE, Kersey PJ, Wegrzyn JL, Leebens-Mack JH, Gostel MR, Liu X, Soltis PS. Green plant genomes: what we know in an era of rapidly expanding opportunities. Proc Natl Acad Sci U S A. 2022;119(4): e2115640118. https://doi.org/10.1073/pnas.2115640118
He W, Li X, Qian Q, Shang L. The developments and prospects of plant super pangenomes: demands, approaches and applications. Plant Commun 2024;6(2):101230.
Gregory TR, Nicol JA, Tamm H, Kullman B, Kullman K, Leitch IJ, Murray BG, Kapraun DF, Greilhuber J, Bennett MD. Eukaryotic genome size databases. Nucleic Acids Res. 2007;35(Database issue):D332-338.
Pflug JM, Holmes VR, Burrus C, Johnston JS, Maddison DR. Measuring genome sizes using read-depth, k-mers, and flow cytometry: methodological comparisons in beetles (Coleoptera). G3: Genes|Genomes|Genetics. 2020;10(9):3047–60.
Pfenninger M, Schonnenbeck P, Schell T. ModEst: accurate estimation of genome size from next generation sequencing data. Mol Ecol Resour. 2022;22(4):1454–64.
Guenzi-Tiberi P, Istace B, Alsos IG, Coissac E, Lavergne S, Aury JM, Denoeud F. LocoGSE, a sequence-based genome size estimator for plants. Front Plant Sci. 2024;15: 1328966.
Natarajan S, Gehrke J, Pucker B. Mapping-based genome size estimation. BMC Genomics. 2025;26(1): 482.
Moeckel C, Mareboina M, Konnaris MA, Chan CSY, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J. 2024;23:2289–303.
Hesse U. K-mer-based genome size estimation in theory and practice. Methods Mol Biol. 2023;2672:79–113.
Hao F, Liu X, Zhou BT, Tian ZZ, Zhou LN, Zong H, Qi JY, He J, Zhang YT, Zeng P, et al. Chromosome-level genomes of three key allium crops and their trait evolution. Nat Genet. 2023;55:1976-1986. https://doi.org/10.1038/s41588-023-01546-0
Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet. 2016;17(6):333–51.
Wenger AM, Peluso P, Rowell WJ, Chang PC, Hall RJ, Concepcion GT, Ebler J, Fungtammasan A, Kolesnikov A, Olson ND, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. 2019;37(10):1155–62.
Ranallo-Benavidez TR, Jaron KS, Schatz MC. GenomeScope 2.0 and smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 2020;11:1432. https://doi.org/10.1038/s41467-020-14998-3
Scarano C, Veneruso I, De Simone RR, Di Bonito G, Secondino A, D’Argenio V. The third-generation sequencing challenge: novel insights for the omic sciences. Biomolecules. 2024;14(5): 568. https://doi.org/10.3390/biom14050568
Espinosa E, Bautista R, Larrosa R, Plata O. Advancements in long-read genome sequencing technologies and algorithms. Genomics. 2024;116(3): 110842.
Zhao Z, Ng YK, Fang X, Li S. Eliminating heterozygosity from reads through coverage normalization. 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2016:174–177. https://doi.org/10.1109/BIBM.2016.7822514
Sun J, Zhang YF, Wang MH, Guan Q, Yang XJ, Ou JX, Yan MC, Wang CR, Zhang Y, Li ZH, et al. The biological significance of multi-copy regions and their impact on variant discovery. Genomics Proteomics Bioinformatics. 2020;18(5):516–24.
Makino T, McLysaght A. Ohnologs in the human genome are dosage balanced and frequently associated with disease. Proc Natl Acad Sci U S A. 2010;107(20):9270–4.
Nakatani Y, Takeda H, Kohara Y, Morishita S. Reconstruction of the vertebrate ancestral genome reveals dynamic genome reorganization in early vertebrates. Genome Res. 2007;17(9):1254–65.
Dehal P, Boore JL. Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol. 2005;3(10): e314.
McLysaght A, Hokamp K, Wolfe KH. Extensive genomic duplication during early chordate evolution. Nat Genet. 2002;31(2):200–4.
Bowers JE, Chapman BA, Rong J, Paterson AH. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature. 2003;422(6930):433–8.
Qiao X, Zhang SL, Paterson AH. Pervasive genome duplications across the plant tree of life and their links to major evolutionary innovations and transitions. Comput Struct Biotechnol J. 2022;20:3248–56.
Li FW, Nishiyama T, Waller M, Frangedakis E, Keller J, Li Z, Fernandez-Pozo N, Barker MS, Bennett T, Blazquez MA, et al. Anthoceros genomes illuminate the origin of land plants and the unique biology of hornworts. Nat Plants. 2020;6(3):259–72.
Lemieux C, Turmel M, Otis C, Pombert JF. A streamlined and predominantly diploid genome in the tiny marine green Alga. Nat Commun. 2019;10(1):4061. https://doi.org/10.1038/s41467-019-12014-x
Sun P, Jiao B, Yang Y, Shan L, Li T, Li X, Xi Z, Wang X, Liu J. WGDI: a user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Mol Plant. 2022;15(12):1841–51.
Rabier CE, Ta T, Ane C. Detecting and locating whole genome duplications on a phylogeny: a probabilistic approach. Mol Biol Evol. 2014;31(3):750–62.
Kellis M, Birren BW, Lander ES. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature. 2004;428(6983):617–24.
Vollger MR, Guitart X, Dishuck PC, Mercuri L, Harvey WT, Gershman A, Diekhans M, Sulovari A, Munson KM, Lewis AP, et al. Segmental duplications and their variation in a complete human genome. Science. 2022;376(6588):55–.
Cheng H, Jarvis ED, Fedrigo O, Koepfli KP, Urban L, Gemmell NJ, Li H. Haplotype-resolved assembly of diploid genomes without parental data. Nat Biotechnol. 2022;40(9):1332–5.
Cheng H, Asri M, Lucas J, Koren S, Li H. Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph. Nat Methods. 2024;21(6):967–70.
Chor B, Horn D, Goldman N, Levy Y, Massingham T. Genomic DNA k-mer spectra: models and modalities. Genome Biol. 2009;10(10):R108.
Cheng L, Wang N, Bao Z, Zhou Q, Guarracino A, Yang Y, Wang P, Zhang Z, Tang D, Zhang P, et al. Leveraging a phased pangenome for haplotype design of hybrid potato. Nature. 2025;640:408-417. https://doi.org/10.1038/s41586-024-08476-9
Hardigan MA, Laimbeer FPE, Newton L, Crisovan E, Hamilton JP, Vaillancourt B, Wiegert-Rininger K, Wood JC, Douches DS, Farre EM, et al. Genome diversity of tuber-bearing solanum uncovers complex evolutionary history and targets of domestication in the cultivated potato. Proc Natl Acad Sci U S A. 2017;114(46):E9999-10008.
Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, et al. The complete sequence of a human genome. Science. 2022;376(6588):44-53. https://doi.org/10.1126/science.abj6987
Wang B, Yang X, Jia Y, Xu Y, Jia P, Dang N, Wang S, Xu T, Zhao X, Gao S, et al. High-quality Arabidopsis thaliana genome assembly with nanopore and HiFi long reads. Genomics Proteomics Bioinformatics. 2022;20(1):4–13.
Shang LG, He WC, Wang TY, Yang YX, Xu Q, Zhao XJ, Yang LB, Zhang H, Li XX, Lv Y, et al. A complete assembly of the rice Nipponbare reference genome. Mol Plant. 2023;16(8):1232–6.
Xu S, Chen R, Zhang X, Wu Y, Yang L, Sun Z, Zhu Z, Song A, Wu Z, Li T, et al. The evolutionary tale of lilies: giant genomes derived from transposon insertions and polyploidization. Innovation (Camb). 2024;5(6): 100726.
Healey AL, Garsmeur O, Lovell JT, Shengquiang S, Sreedasyam A, Jenkins J, Plott CB, Piperidis N, Pompidor N, Llaca V, et al. The complex polyploid genome architecture of sugarcane. Nature. 2024;628(8009):804–10.
Schartl M, Woltering JM, Irisarri I, Du K, Kneitz S, Pippel M, Brown T, Franchini P, Li J, Li M, et al. The genomes of all lungfish inform on genome expansion and tetrapod evolution. Nature. 2024;624(8032):96-103. https://doi.org/10.1038/s41586-024-07830-1
Shao C, Sun S, Liu K, Wang J, Li S, Liu Q, Deagle BE, Seim I, Biscontin A, Wang Q, et al. The enormous repetitive Antarctic Krill genome reveals environmental adaptations and population insights. Cell. 2023;186(6):1279–94. e1219. https://doi.org/10.1016/j.cell.2023.02.005
Peng Y, Yan H, Guo L, Deng C, Wang C, Wang Y, Kang L, Zhou P, Yu K, Dong X, et al. Reference genome assemblies reveal the origin and evolution of allohexaploid oat. Nat Genet. 2022;54(8):1248–58.
Chen W, Yan M, Chen S, Sun J, Wang J, Meng D, Li J, Zhang L, Guo L. The complete genome assembly of Nicotiana benthamiana reveals the genetic and epigenetic landscape of centromeres. Nat Plants. 2024;10(12):1928–43.
Zhang J, Qi Y, Hua X, Wang Y, Wang B, Qi Y, Huang Y, Yu Z, Gao R, Zhang Y, et al. The highly allo-autopolyploid modern sugarcane genome and very recent allopolyploidization in saccharum. Nat Genet. 2025;57:242-253. https://doi.org/10.1038/s41588-024-02033-w
Huang HR, Liu X, Arshad R, Wang X, Li WM, Zhou Y, Ge XJ. Telomere-to-telomere haplotype-resolved reference genome reveals subgenome divergence and disease resistance in triploid Cavendish banana. Hortic Res. 2023;10(9): uhad153.
Bao Z, Li C, Li G, Wang P, Peng Z, Cheng L, Li H, Zhang Z, Li Y, Huang W, et al. Genome architecture and tetrasomic inheritance of autotetraploid potato. Mol Plant. 2022;15(7):1211–26.
Fernandez P, Amice R, Bruy D, Christenhusz MJM, Leitch IJ, Leitch AL, Pokorny L, Hidalgo O, Pellicer J. A 160 Gbp fork fern genome shatters size record for eukaryotes. iScience. 2024;27(6): 109889.
Meyers LA, Levin DA. On the abundance of polyploids in flowering plants. Evolution. 2006;60(6):1198–206.
Reis AC, Franco AL, Campos VR, Souza FR, Zorzatto C, Viccini LF, Sousa SM. rDNA mapping, heterochromatin characterization and AT/GC content of Agapanthus africanus (L.) Hoffmanns (Agapanthaceae). An Acad Bras Cienc. 2016;88(3 Suppl):1727–34.
Ohri D, Fritsch RM, Hanelt P. Evolution of genome size in allium (Alliaceae). Plant Syst Evol. 1998;210(1):57–86.
Ricroch A, Yockteng R, Brown SC, Nadot S. Evolution of genome size across some cultivated allium species. Genome. 2005;48(3):511–20.
Greilhuber J, Dolezel J, Lysak MA, Bennett MD. The origin, evolution and proposed stabilization of the terms ‘genome size’ and ‘C-value’ to describe nuclear DNA contents. Ann Bot. 2005;95(1):255–260. https://doi.org/10.1093/aob/mci019
Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM. De novo assembly of haplotype-resolved genomes with trio binning. Nat Biotechnol. 2018;36:1174-1182. https://doi.org/10.1038/nbt.4277
Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021;18(2):170–5.
Jia KH, Wang ZX, Wang LX, Li GY, Zhang W, Wang XL, Xu FJ, Jiao SQ, Zhou SS, Liu H, et al. Subphaser: a robust allopolyploid subgenome phasing method based on subgenome-specific k-mers. New Phytol. 2022;235(2):801–9.
Wendel JF. Genome evolution in polyploids. Plant Mol Biol. 2000;42(1):225–49.
Otto SP, Whitton J. Polyploid incidence and evolution. Annu Rev Genet. 2000;34(1):401–37.
Soltis PS, Soltis DE. The role of hybridization in plant speciation. Annu Rev Plant Biol. 2009;60:561–88.
del Pozo JC, Ramirez-Parra E. Whole genome duplications in plants: an overview from Arabidopsis. J Exp Bot. 2015;66(22):6991–7003.
Eckardt NA. Two genomes are better than one: widespread paleopolyploidy in plants and evolutionary effects. Plant Cell. 2004;16(7):1647– 1649. https://doi.org/10.1105/tpc.160710
Li H, Durbin R. Genome assembly in the telomere-to-telomere era. Nat Rev Genet. 2024;25(9):658-670. https://doi.org/10.1038/s41576-024-00718-w
Alser M, Rotman J, Deshpande D, Taraszka K, Shi H, Baykal PI, Yang HT, Xue V, Knyazev S, Singer BD, et al. Technology dictates algorithms: recent developments in read alignment. Genome Biol. 2021. https://doi.org/10.1186/s13059-021-02443-7.
Bates S, Dessimoz C, Nevers Y. OMAnnotator: a novel approach to Building an annotated consensus genome sequence. BioRxiv. 2024;626846.
Zeng XF, Yi ZL, Zhang XT, Du YH, Li Y, Zhou ZQ, Chen SJ, Zhao HJ, Yang S, Wang YB, et al. Chromosome-level scaffolding of haplotype-resolved assemblies using Hi-C data without reference genomes. Nat Plants. 2024;10: 1184-1200. https://doi.org/10.1038/s41477-024-01755-3
Liu G, Chen L, Wu Y, Han Y, Bao Y, Zhang T. PDLLMs: A group of tailored DNA large Language models for analyzing plant genomes. Mol Plant. 2024;18(2):175-178 . https://doi.org/10.1016/j.molp.2024.12.006
Behera S, Catreux S, Rossi M, Truong S, Huang ZY, Ruehle M, Visvanath A, Parnaby G, Roddey C, Onuchic V, et al. Comprehensive genome analysis and variant detection at scale using DRAGEN. Nat Biotechnol. 2024. https://doi.org/10.1038/s41587-024-02382-1.
Chen Y, Huang JH, Sun Y, Zhang Y, Li Y, Xu X. Haplotype-resolved assembly of diploid and polyploid genomes using quantum computing. Cell Rep Methods. 2024;4(5): 100754.
Nurk S, Walenz BP, Rhie A, Vollger MR, Logsdon GA, Grothe R, Miga KH, Eichler EE, Phillippy AM, Koren S. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 2020;30(9):1291–305.
Nie F, Ni P, Huang N, Zhang J, Wang Z, Xiao C, Luo F, Wang J. De novo diploid genome assembly using long noisy reads. Nat Commun. 2024;15(1):2964. https://doi.org/10.1038/s41467-024-47349-7
Song L, Florea L, Langmead B. Lighter: fast and memory-efficient sequencing error correction without counting. Genome Biol. 2014;15:509. https://doi.org/10.1186/s13059-014-0509-9
Marcais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27(6):764–70.
Kokot M, Dlugosz M, Deorowicz S. KMC 3: counting and manipulating k-mer statistics. Bioinformatics. 2017;33(17):2759–2761. https://doi.org/10.1093/bioinformatics/btx304
Martayan I, Robidou L, Shibuya Y, Limasset A. Hyper-k-mers: efficient streaming k-mers representation. bioRxiv. 2024:2024.2011.2006.620789 . https://doi.org/10.1101/2024.11.06.620789
Chikhi R, Medvedev P. Informed and automated k-mer size selection for genome assembly. Bioinformatics. 2014;30(1):31–7.
Sun H, Ding J, Piednoel M, Schneeberger K. FindGSE: estimating genome size variation within human and Arabidopsis using k-mer frequencies. Bioinformatics. 2018;34(4):550–7.
Sarmashghi S, Balaban M, Rachtman E, Touri B, Mirarab S, Bafna V. Estimating repeat spectra and genome length from low-coverage genome skims with RESPECT. PLoS Comput Biol. 2021;17(11): e1009449.
Chen S, Zhou Y, Chen Y, Gu J. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884-90.
Said SE, Dickey DA. Testing for unit roots in autoregressive-moving average models of unknown order. Biometrika. 1984;71(3):599–607. https://doi.org/10.1093/biomet/71.3.599
Banerjee A, Dolado JJ, Galbraith JW, Hendry D. Co-integration, error correction, and the econometric analysis of Non-Stationary data. Oxford University Press; 1993.
Trapletti A, Hornik K. Tseries: time series analysis and computational finance: R package version 0.10–58: https://CRAN.R-project.org/package=tseries; 2024.
Tang H, Krishnakumar V, Zeng X, Xu Z, Taranto A, Lomas JS, Zhang Y, Huang Y, Wang Y, Yim WC, et al. JCVI: a versatile toolkit for comparative genomics analysis. Imeta. 2024;3(4): e211.
Kielbasa SM, Wan R, Sato K, Horton P, Frith MC. Adaptive seeds tame genomic sequence comparison. Genome Res. 2011;21(3):487–93.