{"id":229939,"date":"2025-10-21T12:38:07","date_gmt":"2025-10-21T12:38:07","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/229939\/"},"modified":"2025-10-21T12:38:07","modified_gmt":"2025-10-21T12:38:07","slug":"natural-selection-exerted-by-historical-coronavirus-epidemics-comparative-genetic-analysis-in-china-kadoorie-biobank-and-uk-biobank-bmc-genomics","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/229939\/","title":{"rendered":"Natural selection exerted by historical coronavirus epidemic(s): comparative genetic analysis in China Kadoorie Biobank and UK Biobank | BMC Genomics"},"content":{"rendered":"<p>AppendixMethodsStatistics and reproducibility<\/p>\n<p>The details of each analysis are outlined in the methods section and all of the code has been made publicly available on GitHub at <a href=\"https:\/\/github.com\/sahwa\/CKB_COVID_selection\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/github.com\/sahwa\/CKB_COVID_selection<\/a><\/p>\n<p>Study populations and genotyping data<\/p>\n<p>CKB: China Kadoorie Biobank is a population based prospective cohort of\u2009&gt;\u2009512,000 participants, of whom 100,706 had available genotyping data as previously described (22, 26). Individuals were genotyped on custom-designed Axiom\u00ae arrays optimised for individuals with East Asian ancestry, on which 340,562 genotyped variants overlapped with the UK Biobank genotype array. Analyses were based on 513,164 variants passing quality control on both array versions and in all genotyping batches. One individual from each pair of individuals with KING kinship coefficient cutoff\u2009&gt;\u20090.05 (determined using an LD-pruned set of 171,236 variants) was removed to create a set of 76,719 unrelated individuals used in the present study.<\/p>\n<p>UKB: Genotyping data for 805,426 directly-genotyped variants in UKB participants was available under project 50,474. We selected self-identified \u2018White British\u2019 individuals based on Data-Field 22,006 and used an LD-pruned set of 230,948 variants to define an unrelated set of individuals using KING kinship coefficient cutoff\u2009&gt;\u20090.05. From the set of 348,845 unrelated individuals, we randomly selected 76,719 samples to match the number of CKB samples.<\/p>\n<p>Virus interacting proteins<\/p>\n<p>VIPs (n\u2009=\u20094,768 after exclusions) and their categorisations were as defined by Souilmi et al. 2021 [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 12\" title=\"Souilmi Y, Lauterbur ME, Tobler R, Huber CD, Johar AS, Moradi SV, et al. An ancient viral epidemic involving host coronavirus interacting genes more than 20,000 years ago in East Asia. Curr Biol. 2021;31(16):3504-14 e9.\" href=\"http:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/s12864-025-11876-4#ref-CR12\" id=\"ref-link-section-d5871239e2774\" rel=\"nofollow noopener\" target=\"_blank\">12<\/a>], with genomic coordinates of structural genes (build 37) as downloaded using Ensembl v102 [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 49\" title=\"Howe KL, Achuthan P, Allen J, Allen J, Alvarez-Jarreta J, Amode MR, et al. Ensembl 2021. Nucleic Acids Res. 2021;49(D1):D884\u201391.\" href=\"http:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/s12864-025-11876-4#ref-CR49\" id=\"ref-link-section-d5871239e2777\" rel=\"nofollow noopener\" target=\"_blank\">49<\/a>]. VIPs were excluded whose genes were non-autosomal or which lay within an extended MHC region (chr6:21,745,208\u201339,042,510) defined based on results from LRLD identification in CKB. Similarly, for all analyses we only considered VIPs which overlapped with regions genotyped in the CKB and UKB datasets, by splitting the genome up into regions of 500Kbp non overlapping segments and then only considering VIPs which are fully covered by a segment.<\/p>\n<p>Identification of putative regions under selection<\/p>\n<p>                        A)<\/p>\n<p>Long-range linkage disequilibrium<\/p>\n<p>The method for identification of LRLD regions as applied to CKB has been described previously [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 49\" title=\"Howe KL, Achuthan P, Allen J, Allen J, Alvarez-Jarreta J, Amode MR, et al. Ensembl 2021. Nucleic Acids Res. 2021;49(D1):D884\u201391.\" href=\"http:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/s12864-025-11876-4#ref-CR49\" id=\"ref-link-section-d5871239e2803\" rel=\"nofollow noopener\" target=\"_blank\">49<\/a>]. Adapted from an approach to remove distortions in principal components analysis (PCA) [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 23\" title=\"Galinsky KJ, Loh P-R, Mallick S, Patterson NJ, Price AL. Population structure of UK biobank and ancient Eurasians reveals adaptation at genes influencing blood pressure. Am J Hum Genet. 2016;99(5):1130\u20139.\" href=\"http:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/s12864-025-11876-4#ref-CR23\" id=\"ref-link-section-d5871239e2806\" rel=\"nofollow noopener\" target=\"_blank\">23<\/a>], we conducted a systematic iterative search for regions of LRLD by applying a hidden Markov model (HMM) to PCA loadings. For each biobank, an initial variant set was derived by filtering to remove variants with MAF\u2009&lt;\u20090.01 and Hardy\u2013Weinberg P\u2009&lt;\u200910\u20134. We also performed local pairwise LD pruning using plink\u2013indep-pairwise 50 5 0.2 [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 50\" title=\"Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559\u201375.\" href=\"http:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/s12864-025-11876-4#ref-CR50\" id=\"ref-link-section-d5871239e2811\" rel=\"nofollow noopener\" target=\"_blank\">50<\/a>]. We then performed PCA of the pruned genotypes using flashpca [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 51\" title=\"Abraham G, Inouye M. Fast principal component analysis of large-scale genome-wide data. PLoS One. 2014. &#010;                  https:\/\/doi.org\/10.1371\/journal.pone.0093766&#010;                  &#010;                .\" href=\"http:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/s12864-025-11876-4#ref-CR51\" id=\"ref-link-section-d5871239e2814\" rel=\"nofollow noopener\" target=\"_blank\">51<\/a>]. Starting with the variant loadings for PC1, and for each chromosome in turn, variants were assigned to one of two states: under selection (SR) or not, using a hidden Markov model. The emission probability of a variant being within a SR region, given its absolute loading value, was determined from the cumulative P-value from the chi-squared distribution with one degree of freedom. The transition probability between the states is in proportion to EAS recombination rates (downloaded from SniPA [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 52\" title=\"Arnold M, Raffler J, Pfeufer A, Suhre K, Kastenm\u00fcller G. An interactive, genetic variant-centered annotation browser. Bioinformatics. 2015;31(8):1334\u20136.\" href=\"http:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/s12864-025-11876-4#ref-CR52\" id=\"ref-link-section-d5871239e2818\" rel=\"nofollow noopener\" target=\"_blank\">52<\/a>]); over a scaling factor of 1E\u2009+\u20097. The loadings were decoded using the forward\u2013backward algorithm given by Rabiner [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 53\" title=\"Rabiner LR. A tutorial on hidden Markov-models and selected applications in speech recognition. P Ieee. 1989;77(2):257\u201386.\" href=\"http:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/s12864-025-11876-4#ref-CR53\" id=\"ref-link-section-d5871239e2821\" rel=\"nofollow noopener\" target=\"_blank\">53<\/a>], and variants with a marginal likelihood\u2009&gt;\u20090.5 were assigned to the final set of selected regions. SNPs were assigned to one of the two states. Regions were defined by combining consecutive SNPs of the same states, while borders are at the middle points of two consecutive SNPs of different states.<\/p>\n<p>In the next iteration, the SNPs covered by the SR regions were removed and PCA was performed again. Then the newly identified SR regions were merged with the previous sets.\u00a0Once the detection of SR set converged, with no additional SR regions to be discovered, the number of PCs to be parsed were incremented by 1.\u00a0In total we analysed the loadings of the first 11 and 5 PCs for CKB and UKB, respectively, these being the PCs informative for geographical population stratification.<\/p>\n<p>In addition to the CKB and UKB SR sets, we also defined sets of selection regions which were i) the intersection of CKB and UKB or ii) found in CKB but not in UKB.<\/p>\n<p>                        B)<\/p>\n<p>PCA loadings permutation test<\/p>\n<p>To test whether the overlap between VIPs and SR regions was greater than would be expected by chance, we used bedtools (version v2.30.0) [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 54\" title=\"Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841\u20132.\" href=\"http:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/s12864-025-11876-4#ref-CR54\" id=\"ref-link-section-d5871239e2849\" rel=\"nofollow noopener\" target=\"_blank\">54<\/a>] to generate decoy SR sets, to enable derivation of empirical P-values. Given a SR set, for each chromosome, the locations of the selection regions were randomly shuffled, with no overlaps, 10,000 times. We collected the corresponding 10,000 \u201cdecoy\u201d selection region sets.<\/p>\n<p>Adding 10Kbp upstream and downstream to each VIP, the overlap between a VIP gene set and a SR set was compared with the overlap in the decoy SR sets, to give empirical P-values for three sets of features: the number of\u00a0VIP genes overlapping selection regions by at least 1\u00a0bp; the number of VIP genes with greater than half covered by selection regions; and the number of base-pairs covered by the regions. The rank of the genuine overlapping statistics, among the sorted 10,000 decoy values, was taken as the empirical P-value.<\/p>\n<p>VIP set multiple testing correction<\/p>\n<p>To account for testing multiple sets of sometimes correlated VIPs, we applied the procedure from Machado 2007 [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 55\" title=\"Machado AMC. Multiple testing correction in medical image analysis. J Math Imaging Vis. 2007;29(2\u20133):107\u201317.\" href=\"http:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/s12864-025-11876-4#ref-CR55\" id=\"ref-link-section-d5871239e2867\" rel=\"nofollow noopener\" target=\"_blank\">55<\/a>] to determine a Bonferroni correction to apply to the P-value threshold. To derive, \\(S\\), the approximate number of independent tests, first let \\(M\\) be an \\(n*p\\) matrix, where \\(n\\) is the number of VIP sets and \\(P\\) the total number of VIPs across all sets. The elements of \\(M\\), denoted by \\({M}_{ij}\\), are defined as follows:<\/p>\n<p>$$M_{ij} = \\left\\{\\begin{array}{ll} 1, &amp; \\text{if VIP}_j \\in \\text{VIP set}_i \\\\ 0, &amp; \\text{otherwise} \\end{array}\\right. \\qquad i = 1, \\ldots, n;\\; j = 1, \\ldots, p$$<\/p>\n<p>Let \\({v}_{1}, {v}_{2, \\dots., }{v}_{n}\\) be the eigenvalues of \\(M\\). Rescale all eigenvalues so that they sum to n:<\/p>\n<p>$$\\sum\\limits_{k=1}^{n} v_k = n$$<\/p>\n<p>For each eigenvalue \\({v}_{k}\\), modify such that<\/p>\n<p>The sum of the modified eigenvalues,<\/p>\n<p>$$S = \\sum\\limits_{k=1}^{n} v_k$$<\/p>\n<p>gives the number of approximately independent tests, and we accordingly use \\(0.05\/S\\)\u00a0as our significance threshold.<\/p>\n<p>LASSI<\/p>\n<p>Genotype data from each biobank were phased using shapeit v4.1.3 using default settings [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 56\" title=\"Delaneau O, Zagury JF, Robinson MR, Marchini JL, Dermitzakis ET. Accurate, scalable and integrative haplotype estimation. Nat Commun. 2019. &#010;                  https:\/\/doi.org\/10.1038\/s41467-019-13225-y&#010;                  &#010;                .\" href=\"http:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/s12864-025-11876-4#ref-CR56\" id=\"ref-link-section-d5871239e3053\" rel=\"nofollow noopener\" target=\"_blank\">56<\/a>]. The saltiLASSI (v-1.1.1) [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 15\" title=\"DeGiorgio M, Szpiech ZA. A spatially aware likelihood test to detect sweeps from haplotype distributions. PLoS Genet. 2022;18(4): e1010134.\" href=\"http:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/s12864-025-11876-4#ref-CR15\" id=\"ref-link-section-d5871239e3056\" rel=\"nofollow noopener\" target=\"_blank\">15<\/a>] algorithm was applied to the same CKB and UKB datasets of 76,719 participants each. We chose saltiLASSI for these analyses as it performed best in recent benchmarks [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 15\" title=\"DeGiorgio M, Szpiech ZA. A spatially aware likelihood test to detect sweeps from haplotype distributions. PLoS Genet. 2022;18(4): e1010134.\" href=\"http:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/s12864-025-11876-4#ref-CR15\" id=\"ref-link-section-d5871239e3059\" rel=\"nofollow noopener\" target=\"_blank\">15<\/a>], is suitable for genotype data, and can handle large sample sizes of 10,000\u00a0s of haplotypes.<\/p>\n<p>After initial QC (22) no allele frequency\/count filters were applied to the genotype data before applying the selection scan. We used the settings\u2013winsize 10 and\u2013winstep 1, with all other parameters as default. A small window size was selected to give increased power to detect relatively old or weak selective sweeps [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 15\" title=\"DeGiorgio M, Szpiech ZA. A spatially aware likelihood test to detect sweeps from haplotype distributions. PLoS Genet. 2022;18(4): e1010134.\" href=\"http:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/s12864-025-11876-4#ref-CR15\" id=\"ref-link-section-d5871239e3065\" rel=\"nofollow noopener\" target=\"_blank\">15<\/a>]. The value L, the saltiLASSI composite-likelihood ratio test statistic, was used as a metric for the strength of evidence for a selective sweep and the basis on which to define a region under selection.<\/p>\n<p>\u201cSelected regions\u201d (SRs) were defined as regions of contiguous SNPs which had L values above the 0.99 quantile for all L values for that chromosome and at least 200 SNPs away from another contiguous region of SNPs above the 0.99 quantile.<\/p>\n<p>Bootstrapping saltiLASSI regions of selection<\/p>\n<p>To determine whether the overlap between the regions of selection identified by saltiLASSI and different classes of VIPs was greater than would be expected by chance, we used the bootRanges function from the nullRanges R library [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 57\" title=\"Mu W, Davis ES, Lee S, Dozmorov MG, Phanstiel DH, Love MI. bootRanges: flexible generation of null sets of genomic ranges for hypothesis testing. Bioinformatics. 2023. &#010;                  https:\/\/doi.org\/10.1093\/bioinformatics\/btad190&#010;                  &#010;                .\" href=\"http:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/s12864-025-11876-4#ref-CR57\" id=\"ref-link-section-d5871239e3088\" rel=\"nofollow noopener\" target=\"_blank\">57<\/a>]. Following the steps in the vignette, we used the EnsDb.Hsapiens.v86 genome [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 58\" title=\"Rainer J. EnsDb.Hsapiens.v86: Ensembl based annotation package. 2017.\" href=\"http:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/s12864-025-11876-4#ref-CR58\" id=\"ref-link-section-d5871239e3091\" rel=\"nofollow noopener\" target=\"_blank\">58<\/a>] and excluded the following regions.<\/p>\n<p>                        i)<\/p>\n<p>hg38.Kundaje.GRCh38_unified_Excludable<\/p>\n<p>                        ii)<\/p>\n<p>hg38.UCSC.centromere<\/p>\n<p>                        iii)<\/p>\n<p>hg38.UCSC.telomere<\/p>\n<p>                        iv)<\/p>\n<p>hg38.UCSC.short_arm<\/p>\n<p>                        v)<\/p>\n<p>the extended HLA region (chr6:21,745,208\u201339,042,510)<\/p>\n<p>                        vi)<\/p>\n<p>MT, chrY and chrX<\/p>\n<p>The length of isochores (i.e. regions which capture large-scale patterns of GC and gene content) across the human genome are in the range of 300Kbp\u20141Mb [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 59\" title=\"Costantini M, Clay O, Auletta F, Bernardi G. An isochore map of human chromosomes. Genome Res. 2006;16(4):536\u201341.\" href=\"http:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/s12864-025-11876-4#ref-CR59\" id=\"ref-link-section-d5871239e3166\" rel=\"nofollow noopener\" target=\"_blank\">59<\/a>]. Hence, in order to capture the structure of the isochores, we also removed any regions of selection which were longer than 500\u00a0Kb.<\/p>\n<p>We segmented the remaining genome according to gene density. We performed 10,000 bootstrap iterations and calculated the overlap between each VIP set and the bootstrapped saltiLASSI selection regions. The empirical P-value was given by the proportion of times the randomly permuted selection regions had a greater number of overlaps with the VIP set than the true number of selection regions, divided by the number of bootstrap iterations. 97.5% enrichment intervals around the enrichment ratios were obtained by dividing the true proportion of overlaps by the 97.5 quantiles of the bootstrapped distribution of overlaps. We applied the same P-value threshold adjustment as in the LRLD analysis (VIP set Multiple Testing Correction).<\/p>\n<p>ROC curve<\/p>\n<p>To validate the LRLD (Long-Range Linkage Disequilibrium) regions of selection, we used the regions identified by saltiLASSI as the truth set. For each iteration of the LRLD selection scan, we calculated four values, with all lengths measured in base pairs (BP):<\/p>\n<p>                        1.<\/p>\n<p>True Positives (TP)\u2013 The number or total length (in BP) of regions overlapping between saltiLASSI and LRLD.<\/p>\n<p>                        2.<\/p>\n<p>False Positives (FP)\u2013 The number or total length of LRLD regions that do not overlap with saltiLASSI regions.<\/p>\n<p>                        3.<\/p>\n<p>True Negatives (TN)\u2013 The total length of the genome not covered by either saltiLASSI or LRLD regions, divided by the mean length of the LRLD regions.<\/p>\n<p>                        4.<\/p>\n<p>False Negatives (FN)\u2013 The number or total length of saltiLASSI regions that do not overlap with LRLD regions.<\/p>\n<p>We then calculated the True Positive Rate (TPR) as TP\/(TP\u2009+\u2009FN) and the False Positive Rate (FPR) as FP\/(FP\u2009+\u2009TN).<\/p>\n<p>Differences in mean recombination rate between VIP sets<\/p>\n<p>Low recombination rate regions of the genome (recombination rate \u2018coldspots\u2019) can mimic selection signals. Therefore, we aimed to test for differences in recombination rates between VIP categories. We therefore calculated the mean recombination rate across each VIP structural gene and aggregated the results by VIP category (i.e., RNA or COV), using Han Chinese-specific hg38 genetic maps downloaded from <a href=\"https:\/\/zenodo.org\/records\/11437540\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/zenodo.org\/records\/11437540<\/a> [<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 60\" title=\"Spence JP, Song YS. Inference and analysis of population-specific fine-scale recombination maps across 26 diverse human populations. Sci Adv. 2019. &#010;                  https:\/\/doi.org\/10.1126\/sciadv.aaw9206&#010;                  &#010;                .\" href=\"http:\/\/bmcgenomics.biomedcentral.com\/articles\/10.1186\/s12864-025-11876-4#ref-CR60\" id=\"ref-link-section-d5871239e3250\" rel=\"nofollow noopener\" target=\"_blank\">60<\/a>].<\/p>\n<p>To calculate the mean recombination rate per gene, we first overlapped each gene\u2019s coordinates with the corresponding recombination intervals, allowing for partial overlaps where necessary. For each overlap, we calculated the exact base-pair coverage and multiplied it by the interval\u2019s recombination rate. We then summed these coverage-weighted rates and divided by the total length of all overlapping intervals, yielding a weighted average recombination rate per gene. As the recombination rate distribution was heavily right-skewed and long-tailed, we log-transformed the values to normalize them and applied a permutation test to assess mean differences in log-transformed recombination rates across VIP categories. We retained only the non-overlapping VIP categories: non-DNA virus, non-coronavirus RNA virus, non-SARS coronavirus, SARS-coronavirus (excluding those in the 42 under-selection set), and the 42 under-selection set.<\/p>\n<p>To account for the uneven group sizes, we randomly subsampled each VIP category to match the size of the smallest category and calculated an empirical F-statistic. To obtain a null distribution, we performed 10,000 rounds of permutation testing by randomising the VIP-category labels and recalculating the F-statistic at each iteration, again randomly subsampling each category to match the smallest group.<\/p>\n<p>We calculated the empirical P-value as the proportion of permutations where the F-statistic was greater than or equal to the observed empirical F-statistic.<\/p>\n<p>To ensure that our empirical F-statistic wasn\u2019t an artefact of the random subsampling of VIP categories, we repeated the entire permutation procedure 100 times, generating 100 new P-values, each based on a newly resampled version of the original data.<\/p>\n<p>Relative enrichment of VIP sets relative to DNA VIPs<\/p>\n<p>Our analysis and that of Souilmi et al. (2019) suggested that DNA VIPs are not under any kind of detectable selection. Therefore, we used the overlap between saltiLASSI selection regions and DNA VIPs as a null success rate with which to compare other VIP sets against. We calculated P-values of the relative enrichment of non-DNA VIP sets relative to DNA VIP sets using stats::prop.test in R (v4.3.2).<\/p>\n<p>Overlap between selection regions and GWAS hits<\/p>\n<p>We downloaded the lead hits for 3 different COVID-19 related traits (\u201cCritical illness&#8221;,&#8221;Hospitalized&#8221;and&#8221;Reported infection&#8221;from release 7), from <a href=\"https:\/\/www.covid19hg.org\/results\/r7\/\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/www.covid19hg.org\/results\/r7\/<\/a>, retaining only hits in autosomal regions. We also removed any hits which fell inside the extended HLA region (~\u2009chr6:20-40Mb). Due to the small overall number of hits, to maximise power to detect any signals, we combined the lead hits for all 3 traits together, resulting in a total of 51 loci.<\/p>\n<p>We counted the number of overlaps between each class of lead hit loci (defined by the region within 200Kbp of the lead variant) and either the i) regions of long-range linkage disequilibrium (LRLD) or ii) saltiLASSI regions of selection. We also tested varying the window added around each GWAS loci between 50Kbp, 100Kbp, 200Kbp and 500Kbp. To determine whether the overlap between GWAS loci and selection regions was greater than expected by chance, we used the same bootstrapping procedure as in the previous section, using bootranges for the saltiLASSI regions and the decoy regions for regions of LRLD.<\/p>\n","protected":false},"excerpt":{"rendered":"AppendixMethodsStatistics and reproducibility The details of each analysis are outlined in the methods section and all of the&hellip;\n","protected":false},"author":2,"featured_media":229940,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[25],"tags":[7003,64,63,34794,1325,336,8054,33904,9865,17253,17255,138818,17256,17254,128,97575],"class_list":{"0":"post-229939","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-genetics","8":"tag-animal-genetics-and-genomics","9":"tag-au","10":"tag-australia","11":"tag-computational-biology","12":"tag-general","13":"tag-genetics","14":"tag-genomics","15":"tag-humans","16":"tag-life-sciences","17":"tag-microarrays","18":"tag-microbial-genetics-and-genomics","19":"tag-pathogens","20":"tag-plant-genetics-and-genomics","21":"tag-proteomics","22":"tag-science","23":"tag-selection"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/229939","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=229939"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/229939\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/229940"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=229939"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=229939"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=229939"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}