{"id":126856,"date":"2025-09-10T05:13:14","date_gmt":"2025-09-10T05:13:14","guid":{"rendered":"https:\/\/www.newsbeep.com\/uk\/126856\/"},"modified":"2025-09-10T05:13:14","modified_gmt":"2025-09-10T05:13:14","slug":"robust-and-accurate-bayesian-inference-of-genome-wide-genealogies-for-hundreds-of-genomes","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/uk\/126856\/","title":{"rendered":"Robust and accurate Bayesian inference of genome-wide genealogies for hundreds of genomes"},"content":{"rendered":"<p>An overview of the SINGER algorithm<\/p>\n<p>SINGER takes in phased WGS data and samples ARGs iteratively by adding one haplotype at a time through an operation called threading<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 15\" title=\"Rasmussen, M. D., Hubisz, M. J., Gronau, I. &amp; Siepel, A. Genome-wide inference of ancestral recombination graphs. PLoS Genet. 10, e1004342 (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR15\" id=\"ref-link-section-d296768096e562\" rel=\"nofollow noopener\" target=\"_blank\">15<\/a>. Conditioned on a partial ARG for the first n\u2009\u2212\u20091 haplotypes, the threading operation samples the points at which the lineage for the nth haplotype joins the partial ARG. SINGER solves this by first building an HMM with branches as hidden states and sampling a sequence of joining branches along the genome from the posterior, using stochastic traceback (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig1\" rel=\"nofollow noopener\" target=\"_blank\">1a<\/a>). Then, SINGER builds another HMM with joining times as hidden states, conditioned on these sampled joining branches (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig1\" rel=\"nofollow noopener\" target=\"_blank\">1b<\/a>). We refer to these two steps as \u2018branch sampling\u2019 and \u2018time sampling\u2019, respectively (<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"section anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Sec18\" rel=\"nofollow noopener\" target=\"_blank\">Methods<\/a>). Although this two-step threading algorithm is approximative, by substantially reducing the number of hidden states, it is much faster than ARGweaver\u2019s HMM, which treats every joining point in the tree as a hidden state.<\/p>\n<p>Fig. 1: Method overview.<a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41588-025-02317-9\/figures\/1\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig1\" src=\"https:\/\/www.newsbeep.com\/uk\/wp-content\/uploads\/2025\/09\/41588_2025_2317_Fig1_HTML.png\" alt=\"figure 1\" loading=\"lazy\" width=\"685\" height=\"521\"\/><\/a><\/p>\n<p>a\u2013d, The gray lines represent haplotypes, and the circles indicate the allelic states of nodes in coalescent trees. Hollow circles correspond to ancestral alleles, and solid circles are derived alleles. In a, b and c, a partial ARG for the first three haplotypes has already been constructed, and a fourth haplotype is about to be threaded onto this partial ARG. a, The initial step in threading the fourth haplotype involves sampling the joining branch (highlighted in blue) in each marginal coalescent tree of the partial ARG, a process we call \u2018branch sampling\u2019. b, Following the determination of the joining branches, the next step is to sample the joining time for each of these joining branches. This step is referred to as \u2018time sampling\u2019. c,d, To propose an update to an ARG in MCMC, we first introduce cuts (illustrated by red scissors) to a sequence of marginal coalescent trees to prune subtrees. Then we re-graft them by solving the threading problem for the sub-ARG above the cuts. The branch length in the first two marginal trees and the topology of the third tree are altered after \u2018sub-graph pruning and re-grafting\u2019.<\/p>\n<p>To explore the space of ARG topology and branch lengths according to the posterior distribution, SINGER uses an MCMC proposal called \u2018sub-graph pruning and re-grafting\u2019 (SGPR). In brief, an SGPR operation first prunes a sub-graph by introducing a cut and then extends it leftwards and rightwards (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig1\" rel=\"nofollow noopener\" target=\"_blank\">1c<\/a>); in Supplementary Section <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">B.4<\/a>, we show that the pruning step is equivalent to the removal step in the so-called \u2018Kuhner move\u2019<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 41\" title=\"Kuhner, M. K., Yamato, J. &amp; Felsenstein, J. Maximum likelihood estimation of recombination rates from population data. Genetics 156, 1393&#x2013;1401 (2000).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR41\" id=\"ref-link-section-d296768096e639\" rel=\"nofollow noopener\" target=\"_blank\">41<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 42\" title=\"Mahmoudi, A., Koskela, J., Kelleher, J., Chan, Y. B. &amp; Balding, D. Bayesian inference of ancestral recombination graphs. PLoS Comput. Biol. 18, e1009960 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR42\" id=\"ref-link-section-d296768096e642\" rel=\"nofollow noopener\" target=\"_blank\">42<\/a>. However, our re-graft step (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig1\" rel=\"nofollow noopener\" target=\"_blank\">1d<\/a>) differs substantially from the Kuhner move; the latter samples from the prior by simulation, whereas SGPR uses the threading algorithm to sample from the posterior. Given that the Kuhner move ignores data during re-grafting, it rarely improves likelihood, whereas SGPR favors data-compatible updates. Compared to the Kuhner move and ARGweaver, SGPR introduces large updates to the ARG with higher acceptance rates (Supplementary Section <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">B.4<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">1<\/a>), yielding a better convergence rate and mixing of the MCMC.<\/p>\n<p>Lastly, to mitigate biases introduced by algorithmic approximations, SINGER performs \u2018ARG re-scaling\u2019 through a monotonic transformation of node times that aligns the inferred mutation density with branch lengths (<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"section anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Sec18\" rel=\"nofollow noopener\" target=\"_blank\">Methods<\/a>). This is conceptually similar to the \u2018ARG normalization\u2019 procedure introduced in ARG-Needle<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 18\" title=\"Zhang, B. C., Biddanda, A., Gunnarsson, &#xC1;. F., Cooper, F. &amp; Palamara, P. F. Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits. Nat. Genet. 55, 768&#x2013;776 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR18\" id=\"ref-link-section-d296768096e662\" rel=\"nofollow noopener\" target=\"_blank\">18<\/a>, but ARG-Needle uses a provided demographic prior, whereas SINGER learns the transformation from the inferred ARG without external information. As long as the relative ordering of node ages is accurate, ARG re-scaling can calibrate the overall time distribution. In simulation benchmarks, we show that this greatly improves robustness against model misspecification (for example, population size changes) even though the HMMs assume a constant population size. This approach parallels site frequency spectra-based demography methods<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Kamm, J. A., Terhorst, J. &amp; Song, Y. S. Efficient computation of the joint sample frequency spectra for multiple populations. J. Comput. Graph. Stat. 26, 182&#x2013;194 (2017).\" href=\"#ref-CR43\" id=\"ref-link-section-d296768096e666\">43<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Liu, X. &amp; Fu, Y. X. Exploring population size changes using SNP frequency spectra. Nat. Genet. 47, 555&#x2013;559 (2015).\" href=\"#ref-CR44\" id=\"ref-link-section-d296768096e666_1\">44<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 45\" title=\"Liu, X. &amp; Fu, Y. X. Stairway plot 2: demographic history inference with folded SNP frequency spectra. Genome Biol. 21, 280 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR45\" id=\"ref-link-section-d296768096e669\" rel=\"nofollow noopener\" target=\"_blank\">45<\/a> but incorporates explicit tree topologies inferred from SINGER, offering greater robustness to changing population sizes.<\/p>\n<p>Performance benchmarks on simulated data<\/p>\n<p>We first benchmarked the performance of several ARG inference methods (SINGER, ARGweaver, Relate, tsinfer\u2009+\u2009tsdate, ARG-Needle) using data simulated with msprime<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 46\" title=\"Kelleher, J., Etheridge, A. M. &amp; McVean, G. Efficient coalescent simulation and genealogical analysis for large sample sizes. PLoS Comput. Biol. 12, e1004842 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR46\" id=\"ref-link-section-d296768096e681\" rel=\"nofollow noopener\" target=\"_blank\">46<\/a>. The simulation setup and benchmarking procedures are detailed in the <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"section anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Sec18\" rel=\"nofollow noopener\" target=\"_blank\">Methods<\/a>.<\/p>\n<p>Coalescence time accuracy<\/p>\n<p>To evaluate coalescence time estimation, we compared the ground truth and the inferred pairwise coalescence times for 100 randomly chosen leaf-node pairs, following a previous publication<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 36\" title=\"YC Brandt, D., Wei, X., Deng, Y., Vaughn, A. H. &amp; Nielsen, R. Evaluation of methods for estimating coalescence times using ancestral recombination graphs. Genetics 221, iyac044 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR36\" id=\"ref-link-section-d296768096e695\" rel=\"nofollow noopener\" target=\"_blank\">36<\/a>. Pairwise coalescence times are important for applications such as demography inference<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 16\" title=\"Speidel, L., Forest, M., Shi, S. &amp; Myers, S. R. A method for genome-wide genealogy estimation for thousands of samples. Nat. Genet. 51, 1321&#x2013;1329 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR16\" id=\"ref-link-section-d296768096e699\" rel=\"nofollow noopener\" target=\"_blank\">16<\/a>, genome-wide association studies<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 18\" title=\"Zhang, B. C., Biddanda, A., Gunnarsson, &#xC1;. F., Cooper, F. &amp; Palamara, P. F. Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits. Nat. Genet. 55, 768&#x2013;776 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR18\" id=\"ref-link-section-d296768096e703\" rel=\"nofollow noopener\" target=\"_blank\">18<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 25\" title=\"Fan, C., Mancuso, N. &amp; Chiang, C. W. A genealogical estimate of genetic relationships. Am. J. Hum. Genet. 109, 812&#x2013;824 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR25\" id=\"ref-link-section-d296768096e706\" rel=\"nofollow noopener\" target=\"_blank\">25<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 47\" title=\"Link, V. et al. Tree-based QTL mapping with expected local genetic relatedness matrices. Am. J. Hum. Genet. 110, 2077&#x2013;2091 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR47\" id=\"ref-link-section-d296768096e709\" rel=\"nofollow noopener\" target=\"_blank\">47<\/a> and evolutionary studies<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 35\" title=\"Wang, S. &amp; Coop, G. A complex evolutionary history of genetic barriers to gene flow in hybridizing warblers. Preprint at bioRxiv &#010;                https:\/\/doi.org\/10.1101\/2022.11.14.516535&#010;                &#010;               (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR35\" id=\"ref-link-section-d296768096e713\" rel=\"nofollow noopener\" target=\"_blank\">35<\/a>. For 50 haplotypes, SINGER was the most accurate; ARGweaver and Relate performed similarly, while tsinfer\u2009+\u2009tsdate was the least accurate (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig2\" rel=\"nofollow noopener\" target=\"_blank\">2a<\/a>). For 300 haplotypes, we compared only SINGER, Relate, tsinfer\u2009+\u2009tsdate and ARG-Needle, as this sample size is too large for ARGweaver. SINGER again performed best; Relate and ARG-Needle performed similarly and tsinfer\u2009+\u2009tsdate remained the least accurate (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig8\" rel=\"nofollow noopener\" target=\"_blank\">1<\/a>). SINGER\u2019s improved performance over ARGweaver might reflect better MCMC mixing efficiency and more flexible time discretization.<\/p>\n<p>Fig. 2: Performance benchmarks on coalescence time and topology inference.<a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41588-025-02317-9\/figures\/2\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig2\" src=\"https:\/\/www.newsbeep.com\/uk\/wp-content\/uploads\/2025\/09\/41588_2025_2317_Fig2_HTML.png\" alt=\"figure 2\" loading=\"lazy\" width=\"685\" height=\"683\"\/><\/a><\/p>\n<p>a, Inferred pairwise coalescence times compared with the ground truth in simulations involving 50 sequences under a constant population size scenario. b, Similar to a, but for data simulated under an inferred population size history for the CEU population. c, Inferred distribution of pairwise coalescence times (colored) compared with the ground truth (dark gray) from simulations under the same CEU demography as in b. d, Genome-wide average of the number of lineages as a function of time for 50 sequences under a constant population size history, compared with the ground truth in simulations. e, The proportion of triplet topologies that are incorrectly inferred for 50 and 300 sequences under a constant population size history. Owing to runtime constraints, ARGweaver is not benchmarked for 300 sequences.<\/p>\n<p>We also compared against pairwise-coalescent methods that analyze each sequence pair independently, specifically considering the recently proposed method Gamma-SMC<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 48\" title=\"Schweiger, R. &amp; Durbin, R. Ultrafast genome-wide inference of pairwise coalescence times. Genome Res. 33, 1023&#x2013;1031 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR48\" id=\"ref-link-section-d296768096e768\" rel=\"nofollow noopener\" target=\"_blank\">48<\/a>. SINGER substantially outperforms Gamma-SMC (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig2\" rel=\"nofollow noopener\" target=\"_blank\">2a<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">2<\/a>), whereas Relate and tsinfer\u2009+\u2009tsdate show no improvement in either mean squared error or correlation.<\/p>\n<p>We also evaluated the genome-wide average number of lineages as a function of time in marginal trees, a statistic relevant to demography and selection inference. ARGweaver underestimates many recent coalescence times, causing the number of lineages to drop too fast (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig2\" rel=\"nofollow noopener\" target=\"_blank\">2d<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">3<\/a>); this aligns with Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig2\" rel=\"nofollow noopener\" target=\"_blank\">2a<\/a> and corroborates a previous finding<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 16\" title=\"Speidel, L., Forest, M., Shi, S. &amp; Myers, S. R. A method for genome-wide genealogy estimation for thousands of samples. Nat. Genet. 51, 1321&#x2013;1329 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR16\" id=\"ref-link-section-d296768096e790\" rel=\"nofollow noopener\" target=\"_blank\">16<\/a> that ARGweaver tends to underestimate times. On the other hand, tsdate substantially overestimates coalescence times (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig2\" rel=\"nofollow noopener\" target=\"_blank\">2d<\/a>). By contrast, Relate and SINGER agree well with the expectation (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig2\" rel=\"nofollow noopener\" target=\"_blank\">2d<\/a>).<\/p>\n<p>Tree topology accuracy<\/p>\n<p>To assess topology inference accuracy, we used the triplet distance, defined as the fraction of three-leaved subtrees with different topologies in a given pair of trees. This metric is relevant for applications such as imputation and local ancestry, which depend on the accuracy of local topologies. On average, SINGER achieved the lowest triplet distances to the ground truth (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig2\" rel=\"nofollow noopener\" target=\"_blank\">2e<\/a>). Again, ARGweaver was less accurate than SINGER, potentially owing to ARGweaver\u2019s less efficient MCMC and the presence of polytomies in its inferred trees. We also considered an evaluation metric related to the total variation distance introduced in a previous work<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 18\" title=\"Zhang, B. C., Biddanda, A., Gunnarsson, &#xC1;. F., Cooper, F. &amp; Palamara, P. F. Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits. Nat. Genet. 55, 768&#x2013;776 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR18\" id=\"ref-link-section-d296768096e812\" rel=\"nofollow noopener\" target=\"_blank\">18<\/a>, and the results similarly favored SINGER over other methods (Supplementary Section <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">C.2<\/a> and Supplementary Figs. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">4<\/a> and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">5<\/a>).<\/p>\n<p>Robustness to model misspecification<\/p>\n<p>One advantage of SINGER is its robustness to model misspecification; specifically, it is less sensitive to incorrect effective population sizes \\(\\,{(N}_{e})\\) and unmodeled population size changes. When using an \\({N}_{e}\\) that is off by a factor of five, the coalescent times inferred by SINGER were less biased than Relate and tsinfer\u2009+\u2009tsdate, which showed systematic underestimation (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">6<\/a>).<\/p>\n<p>We simulated data under an inferred CEU population size history<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 49\" title=\"Palamara, P. F., Terhorst, J., Song, Y. S. &amp; Price, A. L. High-throughput inference of pair-wise coalescence times identifies signals of selection and enriched disease heritability. Nat. Genet. 50, 1311&#x2013;1317 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR49\" id=\"ref-link-section-d296768096e904\" rel=\"nofollow noopener\" target=\"_blank\">49<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 50\" title=\"Terhorst, J., Kamm, J. A. &amp; Song, Y. S. Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat. Genet. 49, 303&#x2013;309 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR50\" id=\"ref-link-section-d296768096e907\" rel=\"nofollow noopener\" target=\"_blank\">50<\/a>, which contains a bottleneck and recent expansion. On these data, SINGER not only inferred the coalescence times more accurately than ARGweaver, Relate and tsinfer\u2009+\u2009tsdate (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig2\" rel=\"nofollow noopener\" target=\"_blank\">2b<\/a>) but also accurately captured the bi-modality in the pairwise coalescence time distribution caused by the bottleneck (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig2\" rel=\"nofollow noopener\" target=\"_blank\">2c<\/a>). Although Relate can incorporate population size changes, it requires running a separate module of estimating branch lengths and coalescent rates, which takes even longer than running Relate itself. ARG-Needle requires a user-specified size history to adjust its coalescence times and is not able to handle an unknown size history. By contrast, SINGER automatically adjusts branch lengths through ARG re-scaling, with little computational overhead.<\/p>\n<p>Accuracy of mutation and recombination inferences<\/p>\n<p>We also benchmarked allele age estimation using inferred ARGs, excluding ARG-Needle and ARGweaver because ARG-Needle does not map mutations to branches, and ARGweaver\u2019s output is difficult to parse for this task. On simulated data with 50 sequences, SINGER noticeably outperformed Relate and tsinfer\u2009+\u2009tsdate (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig3\" rel=\"nofollow noopener\" target=\"_blank\">3a<\/a>). For 300 sequences, SINGER remained more accurate than Relate and tsinfer\u2009+\u2009tsdate (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig9\" rel=\"nofollow noopener\" target=\"_blank\">2<\/a>).<\/p>\n<p>Fig. 3: Benchmarks on mutation and recombination inference for data simulated with 50 sequences and constant population size.<a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41588-025-02317-9\/figures\/3\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig3\" src=\"https:\/\/www.newsbeep.com\/uk\/wp-content\/uploads\/2025\/09\/41588_2025_2317_Fig3_HTML.png\" alt=\"figure 3\" loading=\"lazy\" width=\"685\" height=\"657\"\/><\/a><\/p>\n<p>a, Inferred allele ages compared with the ground truth. b, Inferred number of recombination breakpoints in 5\u2009kb genomic windows compared with the ground truth. c, The length distribution of pairwise IBD in the inferred ARGs compared with the ground truth.<\/p>\n<p>We also compared the number of recombination breakpoints in 5\u2009kb windows. Only ARGweaver and SINGER produced accurate estimates (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig3\" rel=\"nofollow noopener\" target=\"_blank\">3b<\/a>). Both Relate and tsinfer missed many recombination events, consistent with earlier studies<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 37\" title=\"Deng, Y., Song, Y. S. &amp; Nielsen, R. The distribution of waiting distances in ancestral recombination graphs. Theor. Popul. Biol. 141, 34&#x2013;43 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR37\" id=\"ref-link-section-d296768096e966\" rel=\"nofollow noopener\" target=\"_blank\">37<\/a>.<\/p>\n<p>Finally, we assessed the accuracy of recombination inference by the distribution of pairwise identity-by-descent (IBD) lengths, which are shaped by recombination. ARGweaver and Relate were excluded from this analysis; the former because of difficulties extracting IBD information from its output, and the latter owing to a lack of node persistence across marginal trees. As illustrated in Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig3\" rel=\"nofollow noopener\" target=\"_blank\">3c<\/a>, SINGER accurately captured the distribution of pairwise IBD lengths, while tsinfer substantially overestimated IBD lengths, consistent with previous findings<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 37\" title=\"Deng, Y., Song, Y. S. &amp; Nielsen, R. The distribution of waiting distances in ancestral recombination graphs. Theor. Popul. Biol. 141, 34&#x2013;43 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR37\" id=\"ref-link-section-d296768096e976\" rel=\"nofollow noopener\" target=\"_blank\">37<\/a>.<\/p>\n<p>Comparison of MCMC convergence<\/p>\n<p>In humans and many other organisms, the genome-wide average rates of recombination and mutation are similar, which leads to substantial uncertainty in ARG inference. Therefore, it is important to obtain samples from the posterior distribution and characterize uncertainties, rather than relying on point estimates. To assess MCMC convergence, we obtained 100 posterior MCMC samples from ARGweaver, Relate and SINGER, using the same burn-in and thinning intervals.<\/p>\n<p>To assess the posterior sampling effectiveness, we used the same benchmark as in previous work<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 36\" title=\"YC Brandt, D., Wei, X., Deng, Y., Vaughn, A. H. &amp; Nielsen, R. Evaluation of methods for estimating coalescence times using ancestral recombination graphs. Genetics 221, iyac044 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR36\" id=\"ref-link-section-d296768096e991\" rel=\"nofollow noopener\" target=\"_blank\">36<\/a>. This involved analyzing rank plots of pairwise coalescence times; a uniform distribution would be achieved by a perfect sampler from the posterior distribution<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 51\" title=\"Cook, S. R., Gelman, A. &amp; Rubin, D. B. Validation of software for Bayesian models using posterior quantiles. J. Comput. Graph. Stat. 15, 675&#x2013;692 (2006).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR51\" id=\"ref-link-section-d296768096e995\" rel=\"nofollow noopener\" target=\"_blank\">51<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 52\" title=\"Talts, S., Betancourt, M., Simpson, D., Vehtari, A. &amp; Gelman, A. Validating Bayesian inference algorithms with simulation-based calibration. Preprint at &#010;                https:\/\/arxiv.org\/abs\/1804.06788&#010;                &#010;               (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR52\" id=\"ref-link-section-d296768096e998\" rel=\"nofollow noopener\" target=\"_blank\">52<\/a>. A rank plot is a histogram of the rank of a parameter sampled from the prior relative to the posterior sample. Ideally, a converged and well-mixed MCMC should yield uniformly distributed ranks. By contrast, a U-shaped rank plot suggests sampling from an under-dispersed distribution<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 51\" title=\"Cook, S. R., Gelman, A. &amp; Rubin, D. B. Validation of software for Bayesian models using posterior quantiles. J. Comput. Graph. Stat. 15, 675&#x2013;692 (2006).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR51\" id=\"ref-link-section-d296768096e1002\" rel=\"nofollow noopener\" target=\"_blank\">51<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 52\" title=\"Talts, S., Betancourt, M., Simpson, D., Vehtari, A. &amp; Gelman, A. Validating Bayesian inference algorithms with simulation-based calibration. Preprint at &#010;                https:\/\/arxiv.org\/abs\/1804.06788&#010;                &#010;               (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR52\" id=\"ref-link-section-d296768096e1005\" rel=\"nofollow noopener\" target=\"_blank\">52<\/a>. Compared to ARGweaver and Relate, SINGER\u2019s rank plots are much closer to the uniform distribution (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig4\" rel=\"nofollow noopener\" target=\"_blank\">4b<\/a>).<\/p>\n<p>Fig. 4: Properties of ARG samples and runtimes.<a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41588-025-02317-9\/figures\/4\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig4\" src=\"https:\/\/www.newsbeep.com\/uk\/wp-content\/uploads\/2025\/09\/41588_2025_2317_Fig4_HTML.png\" alt=\"figure 4\" loading=\"lazy\" width=\"685\" height=\"612\"\/><\/a><\/p>\n<p>a, Empirical 90% CIs for pairwise coalescence times as inferred by SINGER, ARGweaver and Relate. b, Rank plots of pairwise coalescence times in MCMC samples. A perfect sampler from the posterior distribution would achieve the flat dashed line, corresponding to the uniform distribution. The Kullback\u2013Leibler (KL) divergence is used to quantify deviation from a uniform distribution. c, The runtime of the threading algorithm as a function of the partial ARG size (measured by the number of leaves), for ARGweaver and SINGER. d, The empirical coverage of the ground truth pairwise coalescence time by the CI for different nominal levels.<\/p>\n<p>The rank plot is closely related to the coverage property of empirical credible intervals (CIs). For each genomic position and pair of haplotypes, the empirical 90% CI is defined by the 5th to the 95th percentile of the sampled coalescence times (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig4\" rel=\"nofollow noopener\" target=\"_blank\">4a<\/a>). The same approach was applied to the 70% and 50% CIs. The 90% CI covered the ground truth in only 44% of instances for Relate and 54% for ARGweaver. By contrast, the coverage was substantially better for SINGER, at 85% (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig4\" rel=\"nofollow noopener\" target=\"_blank\">4c<\/a>). SINGER also compared favorably at other CI levels (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig4\" rel=\"nofollow noopener\" target=\"_blank\">4c<\/a>).<\/p>\n<p>Furthermore, even with thinning intervals 40 times longer than SINGER, ARGweaver still underperforms in pairwise coalescence time inference and CI coverage (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig10\" rel=\"nofollow noopener\" target=\"_blank\">3<\/a>). Combined with our faster threading algorithm, this suggests that ARGweaver would require hundreds to thousands of times longer to match SINGER\u2019s performance. For Relate, even with long thinning intervals, CI coverage remains substantially below nominal levels (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig10\" rel=\"nofollow noopener\" target=\"_blank\">3<\/a> and Supplementary Section <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">C.4<\/a>), probably because Relate samples only coalescence times under a fixed topology, whereas SINGER samples both topologies and coalescence times.<\/p>\n<p>Runtime comparison<\/p>\n<p>Given that both SINGER and ARGweaver use threading algorithms, we compared their threading runtimes as a function of the number of leaves in the partial ARG. SINGER\u2019s threading is approximately 10\u00d7 faster than ARGweaver\u2019s (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig4\" rel=\"nofollow noopener\" target=\"_blank\">4d<\/a>).<\/p>\n<p>Other benchmarks<\/p>\n<p>We performed additional benchmarks<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 53\" title=\"Ralph, P., Thornton, K. &amp; Kelleher, J. Efficiently summarizing relationships in large samples: a general duality between statistics of genealogies and genomes. Genetics 215, 779&#x2013;797 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR53\" id=\"ref-link-section-d296768096e1090\" rel=\"nofollow noopener\" target=\"_blank\">53<\/a> (Supplementary Sections <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">C.2<\/a> and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">C.5<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">7<\/a>) and observed that SINGER outperforms other methods. We also note that a recent independent benchmarking study found that SINGER outperforms other ARG inference methods in reconstructing allele frequency trajectories and polygenic score histories<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 54\" title=\"Peng, D., Mulder, O. J. &amp; Edge, M. D. Evaluating ARG-estimation methods in the context of estimating population-mean polygenic score histories. Genetics 229, iyaf033 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR54\" id=\"ref-link-section-d296768096e1103\" rel=\"nofollow noopener\" target=\"_blank\">54<\/a>.<\/p>\n<p>SINGER supports using an input recombination map to account for the recombination rate variation along the genome<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 55\" title=\"Myers, S., Bottolo, L., Freeman, C., McVean, G. &amp; Donnelly, P. A fine-scale map of recombination rates and hotspots across the human genome. Science 310, 321&#x2013;324 (2005).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR55\" id=\"ref-link-section-d296768096e1110\" rel=\"nofollow noopener\" target=\"_blank\">55<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 56\" title=\"Spence, J. P. &amp; Song, Y. S. Inference and analysis of population-specific fine-scale recombination maps across 26 diverse human populations. Sci. Adv. 5, eaaw9206 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR56\" id=\"ref-link-section-d296768096e1113\" rel=\"nofollow noopener\" target=\"_blank\">56<\/a>, which improves inference accuracy (Supplementary Section <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">C.6<\/a> and Supplementary Figs. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">8<\/a> and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">9<\/a>).<\/p>\n<p>Applications to WGS data from the 1000 Genomes Project<\/p>\n<p>We applied SINGER to 200 whole-genome sequences from five African indigenous populations (GWD, YRI, ESN, LWK and MSL) in the 1000 Genomes Project<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 57\" title=\"Byrska-Bishop, M. et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 185, 3426&#x2013;3440 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR57\" id=\"ref-link-section-d296768096e1135\" rel=\"nofollow noopener\" target=\"_blank\">57<\/a>, with 40 genomes randomly sampled per population (Supplementary Section <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">D.1<\/a>). To demonstrate the utility of SINGER, we analyzed population differentiation in coalescence times, trans-species polymorphism and archaic introgression. We also ran SINGER and Relate on the British (GBR) population data from the 1000 Genomes Project (Supplementary Section <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">D.1<\/a>) and used tsinfer\u2009+\u2009tsdate ARG from a previous publication<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\" title=\"Wohns, A. W. et al. A unified genealogy of modern and ancient genomes. Science 375, eabi8264 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR3\" id=\"ref-link-section-d296768096e1145\" rel=\"nofollow noopener\" target=\"_blank\">3<\/a> for comparison.<\/p>\n<p>Diagnostics of the ARGs sampled by SINGER<\/p>\n<p>We examined the sampled ARGs to check MCMC convergence and to ensure that sampling for inference occurred past proper burn-in (Supplementary Section <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">D.4<\/a>). The chains generally converged well (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">10<\/a>). Additionally, we validated the accuracy of the sampled ARGs by comparing the inferred average pairwise coalescence times (scaled by \\(4{N}_{e}\\mu\\)) with empirical single-nucleotide polymorphism (SNP)-based nucleotide diversities in 1\u2009Mb windows, which showed high concordance. By contrast, Relate and tsinfer\u2009+\u2009tsdate<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\" title=\"Wohns, A. W. et al. A unified genealogy of modern and ancient genomes. Science 375, eabi8264 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR3\" id=\"ref-link-section-d296768096e1197\" rel=\"nofollow noopener\" target=\"_blank\">3<\/a> underestimated the genome-wide variation of diversity (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig11\" rel=\"nofollow noopener\" target=\"_blank\">4a<\/a>). This is possibly because of the \\({N}_{e}\\) variation as a result of background selection<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 58\" title=\"Charlesworth, B., Morgan, M. &amp; Charlesworth, D. The effect of deleterious mutations on neutral molecular variation. Genetics 134, 1289&#x2013;1303 (1993).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR58\" id=\"ref-link-section-d296768096e1233\" rel=\"nofollow noopener\" target=\"_blank\">58<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 59\" title=\"Hudson, R. R. &amp; Kaplan, N. L. Deleterious background selection with recombination. Genetics 141, 1605&#x2013;1617 (1995).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR59\" id=\"ref-link-section-d296768096e1236\" rel=\"nofollow noopener\" target=\"_blank\">59<\/a>. As shown earlier, SINGER is more robust to \\({N}_{e}\\) misspecification. Additionally, tsinfer\u2009+\u2009tsdate has a very biased variant density prediction from inferred ARG compared to observed data (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig11\" rel=\"nofollow noopener\" target=\"_blank\">4b<\/a>). This is probably caused by polytomies, which distort the total branch length. In addition, Relate and tsinfer\u2009+\u2009tsdate require allele polarization (that is, distinguishing ancestral vs derived alleles), but it is difficult for the HLA locus with high levels of trans-species polymorphisms.<\/p>\n<p>Population differentiation in coalescence times<\/p>\n<p>Population-level differentiation in coalescence times at the same genomic locus is often used to identify sites that warrant further evolutionary analysis<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Akey, J. M. et al. Tracking footprints of artificial selection in the dog genome. Proc. Natl Acad. Sci. USA 107, 1160&#x2013;1165 (2010).\" href=\"#ref-CR60\" id=\"ref-link-section-d296768096e1279\">60<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Fan, S. et al. Whole-genome sequencing reveals a complex African population demographic history and signatures of local adaptation. Cell 186, 923&#x2013;939 (2023).\" href=\"#ref-CR61\" id=\"ref-link-section-d296768096e1279_1\">61<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 62\" title=\"Whitlock, M. C. &amp; Lotterhos, K. E. Reliable detection of loci responsible for local adaptation: inference of a null model through trimming the distribution of FST. Am. Nat. 186, S24&#x2013;S36 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR62\" id=\"ref-link-section-d296768096e1282\" rel=\"nofollow noopener\" target=\"_blank\">62<\/a>. Such differentiation could be a result of evolutionary forces such as local adaptations, which reduce diversity for the population experiencing selective sweeps. However, SNP-based diversity can be noisy at fine scales (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">11<\/a>). On the other hand, with accurately inferred ARGs, fine-scale diversity can be estimated more accurately. We observed that SINGER produces more accurate estimates of fine-scale diversity than Relate and tsinfer\u2009+\u2009tsdate (Supplementary Section <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">D.5<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">11<\/a>). This improvement facilitates studying population-specific fine-scale differentiation in coalescence times. Many previously reported loci under positive selection in Europeans<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 63\" title=\"Mathieson, I. &amp; Terhorst, J. Direct detection of natural selection in Bronze Age Britain. Genome Res. 32, 2057&#x2013;2067 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR63\" id=\"ref-link-section-d296768096e1295\" rel=\"nofollow noopener\" target=\"_blank\">63<\/a> appear as outliers when comparing the 1\u2009kb-scale average pairwise time to the most recent common ancestor (TMRCA) between GWD and GBR (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">12<\/a>), probably reflecting European-specific selection. They also exhibit long segments of reduced pairwise TMRCA among target allele carriers compared to the overall sample (Supplementary Section <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">D.6<\/a> and Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig12\" rel=\"nofollow noopener\" target=\"_blank\">5<\/a>). Here, we focus on population-specific reduction in local diversity in African populations.<\/p>\n<p>To find population-specific reduction in local diversity, we partitioned the genome into 1\u2009kb windows and computed the ratio of the ARG-based diversity estimate for the combined sample to that for each of the five populations; reductions in local diversity would show up as peaks when these ratios are plotted along the genome. The full list of regions with elevated ratios for each population is available in the Data Availability section. We highlight a few interesting findings in Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig5\" rel=\"nofollow noopener\" target=\"_blank\">5<\/a> and Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">13<\/a>. For example, we found that the gene MITF has experienced a reduction in diversity in GWD relative to other populations (see also Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">14<\/a>); this gene has been reported to be related to skin<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 64\" title=\"Levy, C., Khaled, M. &amp; Fisher, D. E. MITF: master regulator of melanocyte development and melanoma oncogene. Trends Mol. Med. 12, 406&#x2013;414 (2006).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR64\" id=\"ref-link-section-d296768096e1324\" rel=\"nofollow noopener\" target=\"_blank\">64<\/a>. Around MITF, we observed substantial differences in local diversity across the five populations, consistent with pigmentation variation within Africa<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 61\" title=\"Fan, S. et al. Whole-genome sequencing reveals a complex African population demographic history and signatures of local adaptation. Cell 186, 923&#x2013;939 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR61\" id=\"ref-link-section-d296768096e1332\" rel=\"nofollow noopener\" target=\"_blank\">61<\/a>. In YRI, we found that SPCS3 has reduced diversity in YRI compared to other populations; this gene encodes an immune-related protein believed to impact virion production of flaviviruses such as yellow fever virus<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 65\" title=\"Zhang, R. et al. A CRISPR screen defines a signal peptide processing pathway required by flaviviruses. Nature 535, 164&#x2013;168 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR65\" id=\"ref-link-section-d296768096e1339\" rel=\"nofollow noopener\" target=\"_blank\">65<\/a>. This is concordant with the report of the spread of these diseases in Nigeria<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 66\" title=\"Adogo, L. &amp; Ogoh, M. Yellow fever in Nigeria: a review of the current situation. Afr. J. Clin. Exp. Microbiol. 21, 1&#x2013;13 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR66\" id=\"ref-link-section-d296768096e1343\" rel=\"nofollow noopener\" target=\"_blank\">66<\/a>. Lastly, we found that SCN9A, which encodes a voltage-gated sodium channel involved in the perception of pain<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 67\" title=\"Reimann, F. et al. Pain perception is altered by a nucleotide polymorphism in SCN9A. Proc. Natl Acad. Sci. USA 107, 5148&#x2013;5153 (2010).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR67\" id=\"ref-link-section-d296768096e1351\" rel=\"nofollow noopener\" target=\"_blank\">67<\/a>, has substantially reduced diversity. These examples illustrate the utility of SINGER for exploratory evolutionary analysis, but additional studies are needed to investigate if, and how, selection is acting in these loci.<\/p>\n<p>Fig. 5: ARG-based detection of differentiated coalescence times in African populations.<a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41588-025-02317-9\/figures\/5\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig5\" src=\"https:\/\/www.newsbeep.com\/uk\/wp-content\/uploads\/2025\/09\/41588_2025_2317_Fig5_HTML.png\" alt=\"figure 5\" loading=\"lazy\" width=\"685\" height=\"555\"\/><\/a><\/p>\n<p>a, The ratio of the average pairwise coalescence time in the pooled sample, Tpooled (combining all five populations), to the average population-specific pairwise coalescence time, Twithin, for every 1\u2009kb window. In each plot, the horizontal black dashed line denotes the genome-wide 99.99% quantile, and the gray shaded area corresponds to a 50\u2009kb window surrounding the peak. The positions of these peaks are marked by vertical dashed lines, and the genes overlapping with these signals are indicated. b, The average Twithin for each population, zoomed into the gray regions highlighted in a.<\/p>\n<p>Archaic introgression<\/p>\n<p>Evidence shows that modern humans carry DNA segments from Neanderthals and Denisovans<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710&#x2013;722 (2010).\" href=\"#ref-CR68\" id=\"ref-link-section-d296768096e1404\">68<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Reich, D. et al. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468, 1053&#x2013;1060 (2010).\" href=\"#ref-CR69\" id=\"ref-link-section-d296768096e1404_1\">69<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 70\" title=\"Sankararaman, S. et al. The genomic landscape of Neanderthal ancestry in present-day humans. Nature 507, 354&#x2013;357 (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR70\" id=\"ref-link-section-d296768096e1407\" rel=\"nofollow noopener\" target=\"_blank\">70<\/a> as well as unidentified hominid groups<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 71\" title=\"Hammer, M. F., Woerner, A. E., Mendez, F. L., Watkins, J. C. &amp; Wall, J. D. Genetic evidence for archaic admixture in Africa. Proc. Natl Acad. Sci. USA 108, 15123&#x2013;15128 (2011).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR71\" id=\"ref-link-section-d296768096e1411\" rel=\"nofollow noopener\" target=\"_blank\">71<\/a>. Identification of these introgressed genomic tracts is a challenging task, especially when there is little or no known genome of the source hominids. However, ARGs can facilitate this task by the following observation: for an introgressed tract in a given haplotype, its coalescence with other haplotypes will be depleted in the interval between the introgression time and the split time of modern humans from the \u2018ghost\u2019 population (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig6\" rel=\"nofollow noopener\" target=\"_blank\">6a<\/a>). This is similar to the \u2018long branch\u2019 signals described in previous work<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 16\" title=\"Speidel, L., Forest, M., Shi, S. &amp; Myers, S. R. A method for genome-wide genealogy estimation for thousands of samples. Nat. Genet. 51, 1321&#x2013;1329 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR16\" id=\"ref-link-section-d296768096e1418\" rel=\"nofollow noopener\" target=\"_blank\">16<\/a>, but expressed in the pairwise coalescence space.<\/p>\n<p>Fig. 6: ARG-based detection of archaic introgression tracts.<a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41588-025-02317-9\/figures\/6\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig6\" src=\"https:\/\/www.newsbeep.com\/uk\/wp-content\/uploads\/2025\/09\/41588_2025_2317_Fig6_HTML.png\" alt=\"figure 6\" loading=\"lazy\" width=\"685\" height=\"817\"\/><\/a><\/p>\n<p>a, The demography model involving introgression and the un-introgressed (red) and introgressed (blue) lineages under the model. The time interval from the introgression time to the divergence time of the two populations is called the \u2018introgression window\u2019. b, The receiver operating characteristic (AUROC) plot of using coalescence ratio (CR) and long branch proportion (LB) from Relate and SINGER to differentiate the inferred introgressed and un-introgressed tracts from IBDmix. c, Identification of potential archaic introgression tracts. For a given leaf node, its pairwise coalescence times with every other leaf node in the marginal tree are summarized as a distribution. In the plot, each column represents such a distribution from marginal trees within a 10\u2009kb window. The two white horizontal lines delineate the interval between the introgression time and the split time. A tract indicative of introgression should exhibit a depletion of coalescence events within this interval and an enrichment of coalescence events above the split time. Regions shaded in red denote putative introgression tracts. d, The ratio of pairwise coalescence density above the split time to that within the interval between the introgression time and the split time.<\/p>\n<p>However, the \u2018long branch\u2019 signals can be sensitive to topology inference errors (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">15<\/a>); specifically, the introgressed lineage can group incorrectly with the ancestral lineages of non-introgressed sequences, thereby destroying the long branch (Supplementary Section <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">D.7<\/a>). To mitigate this issue, we provide a technique based on the coalescence distribution heatmap. For each sequence, we plot the distribution of its pairwise coalescence time with the remaining sequences in 10\u2009kb windows (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig6\" rel=\"nofollow noopener\" target=\"_blank\">6c<\/a>), in which each column corresponds to a 10\u2009kb window. We found that using posterior samples of ARGs is helpful, as the coalescence distribution from a single ARG can be noisy (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">16<\/a>). ARG samples with different topologies help smooth the heatmap (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">16<\/a>). This is related to the visualization shown in previous work<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 48\" title=\"Schweiger, R. &amp; Durbin, R. Ultrafast genome-wide inference of pairwise coalescence times. Genome Res. 33, 1023&#x2013;1031 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR48\" id=\"ref-link-section-d296768096e1472\" rel=\"nofollow noopener\" target=\"_blank\">48<\/a>.<\/p>\n<p>To detect introgression tracts, we look for a depletion of probability mass in the aforementioned interval and an enrichment of mass above the interval. This is more robust than long branches because slight mis-grouping would still lead to probabilistic depletion in the interval, whereas the long branch would be disrupted completely. We demonstrated the feasibility of this approach by showing that the ARGs inferred by SINGER can recover the Neanderthal introgression tracts inferred by IBDmix<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 72\" title=\"Chen, L., Wolf, A. B., Fu, W., Li, L. &amp; Akey, J. M. Identifying and interpreting apparent Neanderthal ancestry in African individuals. Cell 180, 677&#x2013;687 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR72\" id=\"ref-link-section-d296768096e1479\" rel=\"nofollow noopener\" target=\"_blank\">72<\/a>, a referenced-based method directly comparing Neanderthal and modern genomes (Supplementary Section <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">D.7<\/a>). The coalescence ratio slightly outperforms long branch signals for detecting archaic introgression in Relate, and SINGER compares favorably with Relate on this task (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig6\" rel=\"nofollow noopener\" target=\"_blank\">6b<\/a>).<\/p>\n<p>Here, we highlight a potential 200\u2009kb Neanderthal introgression tract in GBR (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig6\" rel=\"nofollow noopener\" target=\"_blank\">6c<\/a>), supported by IBDmix and the coalescence depletion signal from SINGER-inferred ARGs. We use 60\u2009kya and 500\u2009kya for introgression and split times, respectively, following previous work<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 73\" title=\"Skoglund, P. &amp; Mathieson, I. Ancient genomics of modern humans: the first decade. Annu. Rev. Genomics Hum. Genet. 19, 381&#x2013;404 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR73\" id=\"ref-link-section-d296768096e1495\" rel=\"nofollow noopener\" target=\"_blank\">73<\/a>, and plot the ratio of coalescence probability above the split time to that in the interval between introgression and split times. This tract appears as distinctive peaks in the ARG-based analysis (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig6\" rel=\"nofollow noopener\" target=\"_blank\">6d<\/a>).<\/p>\n<p>Trans-species polymorphism in the HLA locus<\/p>\n<p>The HLA locus comprises a cluster of genes that encode transmembrane proteins that present antigen peptides to T\u2009cells. This region is known to be the most diverse region in the human genome, and it has been hypothesized to be under extreme balancing selection to maintain high diversity to handle various immune challenges<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 74\" title=\"Fortier, A. L. &amp; Pritchard, J. K. Ancient trans-species polymorphism at the major histocompatibility complex in primates. Preprint at Elife &#010;                https:\/\/doi.org\/10.7554\/eLife.103547.2&#010;                &#010;               (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR74\" id=\"ref-link-section-d296768096e1511\" rel=\"nofollow noopener\" target=\"_blank\">74<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 75\" title=\"Liu, B., Shao, Y. &amp; Fu, R. Current research status of HLA in immune-related diseases. Immun. Inflamm. Dis. 9, 340&#x2013;350 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR75\" id=\"ref-link-section-d296768096e1514\" rel=\"nofollow noopener\" target=\"_blank\">75<\/a>. There has been evidence of trans-species polymorphism for some alleles across primates, which otherwise is very rare<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 74\" title=\"Fortier, A. L. &amp; Pritchard, J. K. Ancient trans-species polymorphism at the major histocompatibility complex in primates. Preprint at Elife &#010;                https:\/\/doi.org\/10.7554\/eLife.103547.2&#010;                &#010;               (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR74\" id=\"ref-link-section-d296768096e1518\" rel=\"nofollow noopener\" target=\"_blank\">74<\/a>.<\/p>\n<p>The ARGs inferred by SINGER show extremely ancient pairwise coalescence times in the HLA locus, with many regions harboring coalescence times older than the human\u2013chimpanzee divergence time (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig7\" rel=\"nofollow noopener\" target=\"_blank\">7a,b<\/a>). In African individuals, we computed the average TMRCA in 1\u2009kb windows on chromosome 6 and found that HLA is the only region with the average TMRCA above 10\u2009million years (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">17<\/a>), making erroneous ARG inference an unlikely explanation for these ancient coalescence times. This is consistent with the hypothesis of strong balancing selection in this locus and the known trans-species polymorphisms. The human\u2013chimpanzee divergence time is estimated to be 5\u201312\u2009Mya<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 76\" title=\"Moorjani, P., Amorim, C. E. G., Arndt, P. F. &amp; Przeworski, M. Variation in the molecular clock of primates. Proc. Natl Acad. Sci. USA 113, 10607&#x2013;10612 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR76\" id=\"ref-link-section-d296768096e1531\" rel=\"nofollow noopener\" target=\"_blank\">76<\/a>. Although many genes in the HLA region do not show strong evidence of coalescence times older than the human\u2013chimpanzee split (for example, TAP1, TAP2 and TAPBP), many do, including HLA-A, HLA-DRB1 and HLA-DRB6. Unsurprisingly, there are no noticeable differences across the five populations, as the polymorphism has been maintained since ancient times. By contrast, in GBR, Relate and tsinfer\u2009+\u2009tsdate<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\" title=\"Wohns, A. W. et al. A unified genealogy of modern and ancient genomes. Science 375, eabi8264 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR3\" id=\"ref-link-section-d296768096e1551\" rel=\"nofollow noopener\" target=\"_blank\">3<\/a> do not recover such extreme coalescence times (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig7\" rel=\"nofollow noopener\" target=\"_blank\">7b<\/a>), probably owing to poor allele polarization in the HLA locus and model misspecification arising from deviations from selective neutrality. To validate our results, we compared the mutation densities in 10\u2009kb windows from real data with predictions from the inferred ARG; SINGER provides a good fit, while Relate and tsinfer\u2009+\u2009tsdate underestimate substantially (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#Fig7\" rel=\"nofollow noopener\" target=\"_blank\">7c<\/a>).<\/p>\n<p>Fig. 7: The fine-scale diversity landscape in the HLA region in Africans and GBR, and comparison between observed and inferred mutation density in GBR.<a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41588-025-02317-9\/figures\/7\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig7\" src=\"https:\/\/www.newsbeep.com\/uk\/wp-content\/uploads\/2025\/09\/41588_2025_2317_Fig7_HTML.png\" alt=\"figure 7\" loading=\"lazy\" width=\"685\" height=\"836\"\/><\/a><\/p>\n<p>a, The average pairwise coalescence time in the HLA in the African sample, with a few genes highlighted (vertical bar colors have no meaning aside from denoting different genes). b, The average pairwise coalescence time in the HLA in GBR, inferred from SINGER, Relate and tsinfer\u2009+\u2009tsdate. c, The observed mutation density from data compared with that predicted from ARGs, inferred from SINGER, Relate and tsinfer\u2009+\u2009tsdate.<\/p>\n<p>In addition to the HLA locus, we extended the analysis genome-wide to find other loci with exceptionally ancient coalescence times (Supplementary Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">18<\/a>), some of which coincide with previous findings of long-term balancing selection, including TRIM5 (ref. <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 77\" title=\"Cagliani, R. et al. Long-term balancing selection maintains trans-specific polymorphisms in the human TRIM5 gene. Hum. Genet. 128, 577&#x2013;588 (2010).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR77\" id=\"ref-link-section-d296768096e1599\" rel=\"nofollow noopener\" target=\"_blank\">77<\/a>), ABO<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 78\" title=\"S&#xE9;gurel, L. et al. The ABO blood group is a trans-species polymorphism in primates. Proc. Natl Acad. Sci. USA 109, 18493&#x2013;18498 (2012).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR78\" id=\"ref-link-section-d296768096e1605\" rel=\"nofollow noopener\" target=\"_blank\">78<\/a>, IGFBP7 (ref. <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 79\" title=\"Leffler, E. M. et al. Multiple instances of ancient balancing selection shared between humans and chimpanzees. Science 339, 1578&#x2013;1582 (2013).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR79\" id=\"ref-link-section-d296768096e1613\" rel=\"nofollow noopener\" target=\"_blank\">79<\/a>), PKD1L1 and DMBT1 (ref. <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 80\" title=\"Bitarello, B. D., Brandt, D. Y., Meyer, D. &amp; Andr&#xE9;s, A. M. Inferring balancing selection from genome-scale data. Genome Biol. Evol. 15, evad032 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02317-9#ref-CR80\" id=\"ref-link-section-d296768096e1623\" rel=\"nofollow noopener\" target=\"_blank\">80<\/a>).<\/p>\n","protected":false},"excerpt":{"rendered":"An overview of the SINGER algorithm SINGER takes in phased WGS data and samples ARGs iteratively by adding&hellip;\n","protected":false},"author":2,"featured_media":126857,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[25],"tags":[5083,5085,3251,5082,5084,3250,916,5081,13781,90,4863,56,54,55],"class_list":{"0":"post-126856","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-genetics","8":"tag-agriculture","9":"tag-animal-genetics-and-genomics","10":"tag-biomedicine","11":"tag-cancer-research","12":"tag-gene-function","13":"tag-general","14":"tag-genetics","15":"tag-human-genetics","16":"tag-population-genetics","17":"tag-science","18":"tag-software","19":"tag-uk","20":"tag-united-kingdom","21":"tag-unitedkingdom"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts\/126856","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/comments?post=126856"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts\/126856\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/media\/126857"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/media?parent=126856"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/categories?post=126856"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/tags?post=126856"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}