{"id":21581,"date":"2025-07-25T16:57:19","date_gmt":"2025-07-25T16:57:19","guid":{"rendered":"https:\/\/www.newsbeep.com\/au\/21581\/"},"modified":"2025-07-25T16:57:19","modified_gmt":"2025-07-25T16:57:19","slug":"single-cell-polygenic-risk-scores-dissect-cellular-and-molecular-heterogeneity-of-complex-human-diseases","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/au\/21581\/","title":{"rendered":"Single-cell polygenic risk scores dissect cellular and molecular heterogeneity of complex human diseases"},"content":{"rendered":"<p>Single-cell multiome dataset<\/p>\n<p>Single-cell multiome (snRNA-seq\u2009+\u2009snATAC-seq) data of the human left ventricle and lung were processed and clustered on the basis of RNA modality using Scanpy<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 137\" title=\"Wolf, F. A., Angerer, P. &amp; Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR137\" id=\"ref-link-section-d34505251e3700\" rel=\"nofollow noopener\" target=\"_blank\">137<\/a>. The cells with high-quality RNA information (total detected gene\u2009&gt;\u2009500, total unique molecular identifiers\u2009&lt;\u200920,000 and mitochondrial read percentage\u2009&lt;\u200910%) were selected for further analysis. Doublets were filtered using scrublet<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 138\" title=\"Wolock, S. L., Lopez, R. &amp; Klein, A. M. Scrublet: computational identification of cell doublets in single-cell transcriptomic data. Cell Syst 8, 281&#x2013;291 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR138\" id=\"ref-link-section-d34505251e3704\" rel=\"nofollow noopener\" target=\"_blank\">138<\/a> with parameters min_counts\u2009=\u20091, min_cells\u2009=\u200910, min_gene_variability_pctl\u2009=\u200990 and n_prin_comps\u2009=\u200930. The thresholds for doublet removal were decided per sample on the basis of the distribution of doublet scores in real versus simulated cells. The top 3,000 highly variable genes were selected by combining the results from each sample separately with seurat_v3 mode. The cell-by-gene count matrices were normalized and scaled. ALLCools with a Python implementation of Seurat integration was used for correction of batch effect between samples with 50 PCs and 30 canonical correlation dimensions<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 136\" title=\"Tian, W. et al. Single-cell DNA methylation and 3D genome architecture in the human brain. Science 382, eadf5357 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR136\" id=\"ref-link-section-d34505251e3708\" rel=\"nofollow noopener\" target=\"_blank\">136<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 139\" title=\"Liu, H. et al. Single-cell DNA methylome and 3D multi-omic atlas of the adult mouse brain. Nature 624, 366&#x2013;377 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR139\" id=\"ref-link-section-d34505251e3711\" rel=\"nofollow noopener\" target=\"_blank\">139<\/a>. Leiden clustering was performed on a k-nearest neighbor (kNN; k\u2009=\u200925) graph. The cell clusters were annotated and merged to cell types by comparing the expression level of predefined marker genes across clusters. The marker genes in Litvi\u0148ukov\u00e1 et al. (2020) and Tucker et al. (2020)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 140\" title=\"Litvi&#x148;ukov&#xE1;, M. et al. Cells of the adult human heart. Nature 588, 466&#x2013;472 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR140\" id=\"ref-link-section-d34505251e3722\" rel=\"nofollow noopener\" target=\"_blank\">140<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 141\" title=\"Tucker, N. R. et al. Transcriptional and cellular diversity of the human heart. Circulation 142, 466&#x2013;482 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR141\" id=\"ref-link-section-d34505251e3725\" rel=\"nofollow noopener\" target=\"_blank\">141<\/a> were used to annotate the heart cell types.<\/p>\n<p>We also examined the ATAC modality of these cells following the methods described below to ensure that these cells also have high-quality open chromatin information. The cells that did not pass ATAC quality controls (QCs) or constituted an ambiguous cluster in ATAC cell embedding were removed, resulting in 10,233 and 10,330 cells retained for downstream analysis for HCM and severe COVID-19, respectively.<\/p>\n<p>scATAC-seq datasets<\/p>\n<p>The cell type labels for the human pancreas and cortex in the original datasets<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 33\" title=\"Chiou, J. et al. Interpreting type 1 diabetes risk with genetics and single-cell epigenomics. Nature 594, 398&#x2013;402 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR33\" id=\"ref-link-section-d34505251e3740\" rel=\"nofollow noopener\" target=\"_blank\">33<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 35\" title=\"Corces, M. R. et al. Single-cell epigenomic analyses implicate candidate causal variants at inherited risk loci for Alzheimer&#x2019;s and Parkinson&#x2019;s diseases. Nat. Genet. 52, 1158&#x2013;1168 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR35\" id=\"ref-link-section-d34505251e3743\" rel=\"nofollow noopener\" target=\"_blank\">35<\/a> were used. To generate cell embeddings, scATAC-seq data were processed and clustered using snapATAC2 (ref. <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 142\" title=\"Zhang, K., Zemke, N. R., Armand, E. J. &amp; Ren, B. A fast, scalable and versatile tool for analysis of single-cell omics data. Nat. Methods 21, 217&#x2013;227 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR142\" id=\"ref-link-section-d34505251e3747\" rel=\"nofollow noopener\" target=\"_blank\">142<\/a>) and ALLCools<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 136\" title=\"Tian, W. et al. Single-cell DNA methylation and 3D genome architecture in the human brain. Science 382, eadf5357 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR136\" id=\"ref-link-section-d34505251e3751\" rel=\"nofollow noopener\" target=\"_blank\">136<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 139\" title=\"Liu, H. et al. Single-cell DNA methylome and 3D multi-omic atlas of the adult mouse brain. Nature 624, 366&#x2013;377 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR139\" id=\"ref-link-section-d34505251e3754\" rel=\"nofollow noopener\" target=\"_blank\">139<\/a>. The fragment files were processed to generate cell-by-bin matrices at 5-kb resolution using snapATAC2 (ref. <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 142\" title=\"Zhang, K., Zemke, N. R., Armand, E. J. &amp; Ren, B. A fast, scalable and versatile tool for analysis of single-cell omics data. Nat. Methods 21, 217&#x2013;227 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR142\" id=\"ref-link-section-d34505251e3758\" rel=\"nofollow noopener\" target=\"_blank\">142<\/a>). The cells with 2,000\u201350,000 total reads and transcription start site (TSS) enrichment\u2009&gt;\u20095 or 7 according to the distribution in specific samples were retained. The cell embeddings were computed with latent semantic indexing (LSI) and batch effects were corrected using the canonical correlation analysis (CCA) LSI mode in ALLCools. Cell-by-peak matrices at 500-bp resolution were generated by calling peaks per cell cluster using snapATAC2. For cortex data, superior and middle temporal gyri and middle frontal gyrus samples were used for AD analysis, resulting in 11,738 cells. For pancreas data, we randomly sampled 10,000 of 64,948 cells covering all annotated cell types for computational acceleration. The single-cell data<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 121\" title=\"Li, Y. E. et al. A comparative atlas of single-cell chromatin accessibility in the human brain. Science 382, eadf7044 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR121\" id=\"ref-link-section-d34505251e3762\" rel=\"nofollow noopener\" target=\"_blank\">121<\/a> we used in the replication experiments were processed and QCed similarly.<\/p>\n<p>Cell\u2013cell similarity network<\/p>\n<p>Following a previous study<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 46\" title=\"Yu, F. et al. Variant to function mapping at single-cell resolution through network propagation. Nat. Biotechnol. 40, 1644&#x2013;1653 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR46\" id=\"ref-link-section-d34505251e3774\" rel=\"nofollow noopener\" target=\"_blank\">46<\/a>, we used the mutual kNN (M-kNN) to measure the similarity between two different cells. We first used LSI to extract low-dimensional embeddings for individual cells. For cortex and left-ventricle datasets encompassing multiple samples, batch effects were corrected using both CCA and Harmony<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 143\" title=\"Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289&#x2013;1296 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR143\" id=\"ref-link-section-d34505251e3778\" rel=\"nofollow noopener\" target=\"_blank\">143<\/a> and integrated latent embeddings were adopted. Next, we computed the Euclidean distance for pairs of cells using their embeddings and then constructed the kNN graph \\(\\hat{G}\\in {{\\mathfrak{R}}}^{M\\times M}\\) on the basis of this distance matrix, in which we defined \\({\\hat{G}}_{i,\\;j}=1(i,j=1,\\ldots ,M)\\) if cell j is within the top k closest cells of cell i and \\({\\hat{G}}_{i,\\;j}=0\\) otherwise. The M-kNN graph G was then defined as the graph whose edges connect nodes (that is, cells) that are mutually kNNs of each other, which was calculated by \\(G=\\hat{G}\\circ {\\hat{G}}^{T}\\), where \\(\\circ\\) denotes the element-wise multiplication.<\/p>\n<p>Target cohorts for T2D and AD<\/p>\n<p>T2D and AD target cohorts were constructed on the basis of the UKBB. All the disease cases were defined according to the ICD-10 (tenth revision of the International Statistical Classification of Diseases and Related Health Problems) code. In particular, all Caucasian individuals with a disease ICD-10 code in the inpatient record, death record or diagnosis summary record were defined as the disease participants. We used E11.9 and G30.9 for AD and T2D, respectively. This resulted in 1,096 T2D and 932 AD cases. We randomly sampled an equal number of healthy controls by matching sex, age and ancestry information for each case group. In addition, individuals with a similar or related phenotype with the disease (T2D: E10, E11, E12, E13, E14, E23.2, N08.3, N25.1, O24, P70.2, Z13.1, Z83.3 and R73.9; AD: F00, G30, F01, F02, F03 and F05) were excluded from constructing the control group. In this study, overweight individuals (body mass index (BMI)\u2009\u2265\u200925) were excluded from constructing the T2D cohort. BMI for each individual was defined as the mean of four BMI measurements in the UKBB Data Field 21001.<\/p>\n<p>Target cohort for HCM<\/p>\n<p>The recruitment of the HCM cohort was part of our California Institute for Regenerative Medicine (CIRM) cardiomyopathy project<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 27\" title=\"Monte, E. et al. Personalized transcriptome signatures in a cardiomyopathy stem cell biobank. Preprint at bioRxiv &#010;                https:\/\/doi.org\/10.1101\/2024.05.10.593618&#010;                &#010;               (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR27\" id=\"ref-link-section-d34505251e4075\" rel=\"nofollow noopener\" target=\"_blank\">27<\/a>. The targeted population constituted persons with various cardiac procedures and noncardiac participants with genetic conditions in clinic who were identified to us by their clinical providers. Noncardiac participants were recruited in person during onsite clinic days or over the phone with permission by the providers. Healthy volunteers were recruited from our cardiovascular prevention clinic (that is, persons with no diagnosis of heart disease).<\/p>\n<p>Library preparation and sequencing was performed by Macrogene (first ten samples) and Novogene on genomic DNA we extracted from iPS cells (Qiagen DNeasy kit). Paired-end 150-bp reads were acquired on the Illumina HiSeq X Ten for a minimum of 90\u2009Gb of data. Reads were processed using Sentieon\u2019s FASTQ-to-VCF pipeline (Sentieon version 201808.07)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 144\" title=\"Kendig, K. I. et al. Sentieon DNASeq variant calling workflow demonstrates strong computational performance and accuracy. Front. Genet. 10, 736 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR144\" id=\"ref-link-section-d34505251e4082\" rel=\"nofollow noopener\" target=\"_blank\">144<\/a>. This pipeline is a drop-in replacement for a Burrows\u2013Wheeler aligner (BWA)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 145\" title=\"Li, H. &amp; Durbin, R. Fast and accurate short read alignment with Burrows&#x2013;Wheeler transform. Bioinformatics 25, 1754&#x2013;1760 (2009).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR145\" id=\"ref-link-section-d34505251e4086\" rel=\"nofollow noopener\" target=\"_blank\">145<\/a> plus GATK best-practices<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 146\" title=\"DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491&#x2013;498 (2011).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR146\" id=\"ref-link-section-d34505251e4090\" rel=\"nofollow noopener\" target=\"_blank\">146<\/a> pipeline for germline single-nucleotide variations (SNVs) and indels but has been highly tuned for optimal computational efficiency. BWA alignment to hg38 was followed by deduplication, realignment, base quality score recalibration and variant calling to generate g.vcf files for each sample. Coverage was assessed (GATK version 3.7)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 27\" title=\"Monte, E. et al. Personalized transcriptome signatures in a cardiomyopathy stem cell biobank. Preprint at bioRxiv &#010;                https:\/\/doi.org\/10.1101\/2024.05.10.593618&#010;                &#010;               (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR27\" id=\"ref-link-section-d34505251e4094\" rel=\"nofollow noopener\" target=\"_blank\">27<\/a>. Individual sample g.vcf files were joined and variant quality score recalibration was performed.<\/p>\n<p>Target cohort for severe COVID-19<\/p>\n<p>The VA COVID-19 cohort was derived from the VA MVP. The VA MVP is an ongoing national voluntary research program that aims to better understand how genetic, lifestyle and environmental factors influence veteran health<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 28\" title=\"Gaziano, J. M. et al. Million Veteran Program: a mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70, 214&#x2013;223 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR28\" id=\"ref-link-section-d34505251e4106\" rel=\"nofollow noopener\" target=\"_blank\">28<\/a>. Briefly, individuals aged 18 to over 100\u2009years old have been recruited from over 60 VA medical centers nationwide since 2011 with current enrollment at &gt;800,000. Informed consent is obtained from all participants to provide blood for genomic analysis and access to their full electronic health record data within the VA before and after enrollment. The study received ethical and study protocol approval from the VA central institutional review board (IRB) in accordance with the principles outlined in the Declaration of Helsinki. COVID-19 cases were identified using an algorithm developed by the VA COVID national surveillance tool based on reverse transcription (RT)\u2013qPCR laboratory test results conducted at VA clinics, supplemented with natural language processing on clinical documents for SARS-CoV-2 tests conducted outside of the VA<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 147\" title=\"Song, R. J. et al. Phenome-wide association of 1809 phenotypes and COVID-19 disease progression in the Veterans Health Administration Million Veteran Program. PLoS ONE 16, e0251651 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR147\" id=\"ref-link-section-d34505251e4110\" rel=\"nofollow noopener\" target=\"_blank\">147<\/a>. This resulted in the VA COVID-19 WGS cohort of 2,716 persons with COVID-19 spanning a wide range of ages and ancestries. We defined severe COVID-19 cases as persons who were hospitalized, received acute care, stayed in the intensive cure unit or were deceased and controls as those who did not meet these criteria. To minimize potential confounders, we restricted our analysis to nonelderly individuals (age\u2009&lt;\u200965).<\/p>\n<p>DNA isolated from peripheral blood samples was used for WGS. Libraries were prepared using KAPA hyper prep kits, PCR-free according to manufacturers\u2019 recommendations. Sequencing was performed using an Illumina NovaSeq 6000 System (Illumina) with paired-end 2\u00d7 150-bp read lengths and Illumina\u2019s proprietary reversible terminator-based method. The specimens were sequenced to a minimum depth of 25\u00d7 per specimen and an average coverage of 30\u00d7 per plate.<\/p>\n<p>Independent target cohorts<\/p>\n<p>The GoT2D cohort including 2,874 individuals was used as the independent target cohort for T2D. Samples were sequenced using three technologies: deep whole-exome sequencing, low-pass (4\u00d7) WGS and OMNI 2.5M genotyping. Genotypes (SNVs, indels and structural variants) were called separately for each technology and then integrated by genotype refinement into a single phased reference panel. More details can be found in a previous study<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 39\" title=\"Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536, 41&#x2013;47 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR39\" id=\"ref-link-section-d34505251e4125\" rel=\"nofollow noopener\" target=\"_blank\">39<\/a>.<\/p>\n<p>The HCM independent target cohort was constructed by extracting non-EUR HCM samples (ICD-10: I42.1\/I42.2) and a same number of randomly selected non-EUR controls matching age and sex from the UKBB genotype dataset. This resulted in a total of 152 samples.<\/p>\n<p>The WGS data of the independent target cohort for AD were obtained from the ADNI database. A total of 808 whole genomes were downloaded from ADNI, for which we defined individuals with a diagnosis of \u2018dementia\u2019 as cases and \u2018cognitively normal\u2019 as controls.<\/p>\n<p>WGS data processing<\/p>\n<p>The WGS data for HCM and COVID-19 were processed using the functional equivalence GATK variant-calling pipeline<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 148\" title=\"Regier, A. A. et al. Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects. Nat. Commun. 9, 4038 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR148\" id=\"ref-link-section-d34505251e4143\" rel=\"nofollow noopener\" target=\"_blank\">148<\/a>, which was developed by the Broad Institute and plugged into our data and task management system Trellis. The human reference genome build was GRCh38. We used BWA-MEM (version 0.7.15) to align reads, Picard 2.15.0 to mark PCR duplicates and GATK 4.1.0.0 for base quality score recalibration and variant calling using the \u2018haplotypeCaller\u2019 function. We also used FASTQC (version 0.11.4), SAMtools \u2018flagstat\u2019 (version 0.1.19) and RTG Tools \u2018vcfstats\u2019 (version 3.7.1) to assess the qualities of the FASTQ, BAM and gVCF files, respectively. In addition, we used \u2018verifybamID\u2019 in GATK 4.1.0.0 to estimate DNA contamination rates for individual genomes and removed samples with 5% or more contaminated reads.<\/p>\n<p>QCs of genotype data<\/p>\n<p>We performed stringent QCs for the genotype data following the PRS tutorial (<a href=\"https:\/\/choishingwan.github.io\/PRS-Tutorial\/\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/choishingwan.github.io\/PRS-Tutorial\/<\/a>). For the GWAS summary statistics data (also referred to as the discovery or base data), genetic variants with low MAF and imputation information score (INFO) were removed. We used thresholds suggested in corresponding original papers: MAF\u2009&lt;\u20090.0001, 0.001 and 0.0001 and INFO\u2009&lt;\u20090.4, 0.6 and 0.6 for T2D, HCM and AD, respectively. We also excluded duplicated and ambiguous variants to guarantee the accuracy of PRS calculation.<\/p>\n<p>For the individual-level genotype data (also referred to as the target data), we carried out both variant-level and individual-level QCs. For WGS data, we performed pre-QCs: we removed samples with kinship\u2009&gt;\u20090.03, sample call rate\u2009&lt;\u20090.97 or mean sample coverage\u2009\u2264\u200918\u00d7; genomic positions resided in low-complexity regions or ENCODE-blacklisted regions were removed; we filtered out genotypes in individual samples that were detected with too low or too high read coverages (read depth\u2009&lt;\u20095 or read depth\u2009&gt;\u20091,500); we required all calls to have genotype quality\u2009\u2265\u200920 and, for nonreference calls, a sufficient portion (&gt;0.9) of reads was required to cover the alternate alleles.<\/p>\n<p>For all target cohorts, we removed variants with INFO\u2009&lt;\u20090.8 (for UKBB-based cohort), missing call rate\u2009&gt;\u20090.01, MAF\u2009&lt;\u20090.01 or Hardy\u2013Weinberg equilibrium\u2009&lt;\u20091\u2009\u00d7\u200910\u22126. For variants with mismatching alleles between discovery and target data, we strand-flipped these alleles to their complementary ones. We further excluded individuals with genotyping rate\u2009&lt;\u20090.01 or with extreme heterozygosity rate (that is, beyond 3\u2009s.d. from the mean). Individuals with an up-to-second-degree relative (\u03c0\u2009&gt;\u20090.125) within the cohort were also removed to prevent bias in prediction evaluation. Lastly, there were 2,176 (n\u2009=\u20091,088 cases, n\u2009=\u20091,088 controls), 134 (n\u2009=\u200981 cases, n\u2009=\u200953 controls), 1,839 (n\u2009=\u2009919 cases, n\u2009=\u2009920 controls) and 581 (n\u2009=\u2009120 cases, n\u2009=\u2009461) individuals passing the above QCs for T2D, HCM, AD and severe COVID-19 cohorts, respectively.<\/p>\n<p>All independent target cohorts were processed and QCed using the same pipeline. After sample-level QCs, the final cohorts consisted of 2,749 samples (1,398 cases and 1,351 controls) for GoT2D, 62 samples (23 cases and 39 controls) for non-EUR UKBB and 469 samples (251 cases and 218 controls) for ADNI.<\/p>\n<p>PC analysis for genotype data<\/p>\n<p>To characterize the population structure of target cohorts, PC analysis was performed after pruning (window size\u2009=\u2009200 variants, sliding step size\u2009=\u200950 variants, LD r2 threshold\u2009=\u20090.25). The first ten PCs were retained as covariates in the downstream analysis.<\/p>\n<p>PLINK C+T PRS calculation<\/p>\n<p>The cell-level C+T PRS was computed using PLINK, which is given by<\/p>\n<p>$${\\rm{PRS}}_{j}=\\frac{{\\sum }_{i\\in {\\rm{cCRE}}_{j}}{\\beta }_{i}\\times {G}_{i}}{P\\times M},$$<\/p>\n<p>where \\({\\rm{cCRE}}_{j}\\) denotes cCREs within cell j, \\({\\beta }_{i}\\) is the effect size of variant i, \\({G}_{i}\\) represents the number of effect alleles, P is the ploidy of the sample (2 for human) and M is the number of nonmissing variants. In the clumping phase, all index variants were forced to be drawn from the variants located within scATAC-seq peaks of individual cells using the \u2018&#8211;clump-index-first\u2019 option. Variants within 250\u2009kb of the index variant and three LD thresholds (r2\u2009=\u20090.1, 0.3 and 0.5) were considered for clumping. After constructing the index variant set, we applied multiple P-value thresholds (P\u2009=\u20091\u2009\u00d7\u200910\u22125, 1\u2009\u00d7\u200910\u22124, 1\u2009\u00d7\u200910\u22123, 0.01, 0.05, 0.1 and 0.5) to compute PRSs, resulting in 21 PRSs calculated for each cell and each individual. We used the 1,000 Genomes Project samples to estimate the LD (out-sample estimation) for the simulation, HCM and severe COVID-19 cohorts because of their limited sample sizes, while using the target data (in-sample estimation) for other cohorts.<\/p>\n<p>The standard C+T PRS was calculated using the same set of parameters as that used in computing cell-level PRS, except that all variants were considered without conditioning. The P-value and LD r2 thresholds were regarded as hyperparameters to be optimized in model selection.<\/p>\n<p>Model details of scPRS<\/p>\n<p>The cell-level PRS matrix \\({X}_{n}\\in {{\\mathfrak{R}}}^{M\\times 21}(n\\in 1,\\ldots ,N)\\) presents single-cell-resolved genetic risk features for each individual and it is input into the scPRS model to predict the disease risk. Here, N and M denote the numbers of individuals and cells, respectively.<\/p>\n<p>scPRS consists of three modules (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#Fig1\" rel=\"nofollow noopener\" target=\"_blank\">1<\/a>): the feature-embedding module, the graph convolutional network module and the readout module. The feature-embedding module takes normalized cell-level PRS \\({X}_{n}\\) as the input and uses a one-layer perceptron to reweight and integrate 21 PRS features per cell:<\/p>\n<p>$${h}_{n}^{(0)}={X}_{n}\\bullet {\\rm{abs}}({W}_{0}),$$<\/p>\n<p>where W0 denotes learnable model parameters, abs represents the absolute function and \\({h}_{n}^{(0)}\\in {{\\mathfrak{R}}}^{M}\\) represents the integrated features of M cells for individual n. According to the definition of PRS, larger values in Xn indicate higher disease risk. To maintain this interpretability throughout the modeling, we adopt the absolute function abs to enforce nonnegativity for W0.<\/p>\n<p>We next seek to integrate PRS features across different cells to generate a final risk score. With the consideration of the dropout event and sparsity of scATAC-seq data and assuming that cells with similar low-dimensional embeddings should have comparable epigenomes and then similar genetic signals, we use a GNN<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 149\" title=\"Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. &amp; Dahl, G. E. Neural message passing for quantum chemistry. Preprint at &#010;                https:\/\/doi.org\/10.48550\/arXiv.1704.01212&#010;                &#010;               (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR149\" id=\"ref-link-section-d34505251e4758\" rel=\"nofollow noopener\" target=\"_blank\">149<\/a> to smooth and denoise single-cell-level PRS features. More specifically, on the basis of the pre-computed M-kNN graph G, the GNN module is defined as<\/p>\n<p>$${g}_{v}^{(t+1)}=\\frac{1}{{\\mathrm{deg}} (v)}\\sum _{u{{\\in }}{\\mathscr{N}}{(}v)}\\left({\\rm{abs}}\\left({w}_{1}^{(t)}\\right){h}_{u}^{(t)}+{\\rm{abs}}\\left({w}_{2}^{(t)}\\right){h}_{v}^{(t)}\\right),$$<\/p>\n<p>$${h}_{v}^{(t+1)}={\\rm{leaky}}\\;{\\rm{ReLU}}\\left({g}_{v}^{(t+1)}\\right),$$<\/p>\n<p>where \\({h}_{v}^{(t)}\\) denotes the hidden feature of cell v at layer t, \\({w}_{1}^{(t)}\\) and \\({w}_{2}^{(t)}\\) are learnable parameters of layer t, deg denotes the degree of each node or cell and \\({\\mathscr{N}}{(}v)\\) represents the neighbors of cell v in the M-kNN graph G. The leaky ReLU activation function is defined as<\/p>\n<p>$${\\rm{Leaky}}\\;{\\rm{ReLU}}(x)=\\max (\\alpha \\times x,x),$$<\/p>\n<p>where \\(\\alpha =0.1\\) is used in this study. Note that the absolute function is also adopted to induce nonnegativity to model weights.<\/p>\n<p>Lastly, we design a readout module to map GNN-smoothed hidden features to the phenotype leveraging a one-layer perceptron:<\/p>\n<p>$$y=\\sigma \\left(\\beta \\bullet {h}^{(T)}+b\\right),$$<\/p>\n<p>where \\(\\beta \\in {{\\mathfrak{R}}}^{M}\\) represents the learnable regression coefficients indicating cell importance to prediction, T is the number of total layers in GNN, b is the bias term and \\(\\sigma\\) is the sigmoid function for binary classification and the identify function for regression.<\/p>\n<p>Optimization of scPRS<\/p>\n<p>To train scPRS for disease prediction, we adopt the binary cross-entropy (BCE) loss and additional regularization functions for enhancing predictive power and model interpretability. The loss function \\({\\mathcal{L}}\\) of scPRS is defined as<\/p>\n<p>$${\\mathcal{L}}{{=}}\\frac{1}{N}\\sum _{n}\\left({y}_{n}\\log ({p}_{n})+(1-{y}_{n})\\log (1-{p}_{n})\\right)+{\\lambda }_{1}{{||}\\beta {||}}_{1}+{\\lambda }_{2}{{||}\\beta {||}}_{2}+{\\lambda }_{3}{\\beta }^{T}{G}_{L}\\beta,$$<\/p>\n<p>where \\({y}_{n}\\in \\{\\mathrm{0,1}\\}\\) is the true disease label for individual n, \\({p}_{n}\\in [\\mathrm{0,1}]\\) is the scPRS-predicted disease probability and \\({{||}\\bullet {||}}_{1}\\) and \\({{||}\\bullet {||}}_{2}\\) represent L1 and L2 norms, respectively. We also add a Laplacian regularization term based on the symmetric normalized Laplacian matrix GL, which is defined as<\/p>\n<p>$${G}_{L}={D}^{\\frac{1}{2}}(D-A){D}^{-\\frac{1}{2}},$$<\/p>\n<p>where D and A denote the degree and adjacency matrices of the cell\u2013cell similarity graph G, respectively. We use hyperparameters \\({\\lambda }_{1}\\), \\({\\lambda }_{2}\\) and \\({\\lambda }_{3}\\) to balance across different regularization terms.<\/p>\n<p>scPRS was trained by minimizing the loss \\({\\mathscr{L}}\\) using the Adam algorithm<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 150\" title=\"Kingma, D. P. &amp; Ba, J. Adam: a method for stochastic optimization. Preprint at &#010;                https:\/\/doi.org\/10.48550\/arXiv.1412.6980&#010;                &#010;               (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR150\" id=\"ref-link-section-d34505251e6205\" rel=\"nofollow noopener\" target=\"_blank\">150<\/a> with a learning rate of 1\u2009\u00d7\u200910\u22123 and batch size of 32. We trained scPRS for 200 epochs. Multiple sets of hyperparameters were considered in model selection, including \\(T\\in \\{\\mathrm{0,1,2}\\}\\), \\({\\lambda }_{1}\\in \\{\\mathrm{0,1,10}\\}\\), \\({\\lambda }_{2}\\in \\{\\mathrm{1,10,50,100,250,500,750}\\}\\), \\({\\lambda }_{3}\\in \\{\\mathrm{0.01,0.1,0.5,1,2.5,5,10,50,100}\\}\\) and M-kNN neighbor number \\(k\\in \\{\\mathrm{25,50}\\}\\). We also selected between CCA-based and Harmony-based cell\u2013cell similarity networks for T2D and AD.<\/p>\n<p>In prediction evaluation, we randomly partitioned the dataset into training, validation and testing sets comprising 60%, 20% and 20% of samples, respectively. We trained different scPRS models with all possible combinations of hyperparameters and assessed their performance (measured by AUROC) on the validation dataset. We selected the model yielding the best performance on the validation set and reported its performance on the held-out test set. This process was repeated ten times with different random seeds to assess the robustness of the model. Predictive performance was evaluated using both the AUROC and the AUPRC.<\/p>\n<p>In cell prioritization, we conducted fivefold cross-validation, which was repeated five times. The best hyperparameter set was then selected on the basis of the average AUROC score. The final model was trained with this optimal hyperparameter set on the entire dataset. To examine the variability of cell weights learned from model training, we trained 100 models using different random seeds.<\/p>\n<p>For the regression task, the mean squared error was used as the loss function instead of BCE. The model performance was evaluated based on the Pearson correlation between true and predicted values.<\/p>\n<p>Calculation of nonpeak and peak PRS<\/p>\n<p>Similar to the cell-level PRS, the calculation of nonpeak PRS was based on PLINK C+T, using only variants outside of scATAC-seq peaks as the index variants. A total of 21 nonpeak PRSs were computed and integrated in scPRS+, corresponding to different combinations of C+T parameters: P\u2009\u2208\u2009{1\u2009\u00d7\u200910\u22125, 1\u2009\u00d7\u200910\u22124, 1\u2009\u00d7\u200910\u22123, 0.01, 0.05, 0.1, 0.5} and r2\u2009\u2208\u2009{0.1, 0.3, 0.5}. For scPRS+ (integrating cell-level PRSs and nonpeak PRSs) and scPRS+covar (integrating cell-level PRSs, nonpeak PRSs, age, sex and ten PCs), we concatenated additional features to latent cell features \\({h}^{(T)}\\) at the final GNN layer.<\/p>\n<p>In calculating the single-cell-type peak PRS, only variants located within cell-type peaks were used to select the index variants, where the same 21 combinations of C+T parameters were adopted. A multi-cell-type PRS was further built by combining all single-cell-type PRSs (n\u2009=\u200921\u2009\u00d7\u2009ncell type) using LR. LR was trained on the training data and the performance was reported on the testing data.<\/p>\n<p>Implementation details on LDpred2, Lassosum and PolyPred<\/p>\n<p>We implemented LDpred2 and Lassosum following the bigsnpr tutorial (<a href=\"https:\/\/privefl.github.io\/bigsnpr\/articles\/LDpred2.html\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/privefl.github.io\/bigsnpr\/articles\/LDpred2.html<\/a>). Three LDpred2 models were implemented\u201d the infinitesimal model (LDpred2-inf), grid model (LDpred-inf) and auto model (LDpred2-auto). All model hyperparameters were selected on the basis of recommendations provided in the tutorial. To ensure a fair comparison, we maintained the same dataset splits (that is, training, validation and test sets) as those used in scPRS. For PLINK C+T, LDpred2-grid and Lassosum, the best model hyperparameters were determined on the basis of predictive performance on the validation dataset.<\/p>\n<p>For a fair comparison, we used scATAC-seq peaks as the functional annotation for variants in PolyPred and adopted the same GWASs as those used in scPRS to compute prior causal probabilities<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 38\" title=\"Weissbrod, O. et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 52, 1355&#x2013;1363 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR38\" id=\"ref-link-section-d34505251e6518\" rel=\"nofollow noopener\" target=\"_blank\">38<\/a>. We implemented PolyPhred following the manual provided by the authors (<a href=\"https:\/\/github.com\/omerwe\/polyfun\/wiki\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/github.com\/omerwe\/polyfun\/wiki<\/a>).<\/p>\n<p>Unlike C+T, more advanced PRS methods, including LDpred2, Lassosum and PolyPred, inherently optimize r2 and P-value cutoffs to select an optimal set of variants for PRS computation. This flexibility in optimization is a key innovation of these approaches.<\/p>\n<p>Benchmark on independent target cohorts<\/p>\n<p>Because the original GWAS discovery cohorts for T2D and AD overlapped with GoT2D and ADNI, respectively, to prevent information leakage, we adopted the UKBB GWAS<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 151\" title=\"Karczewski, K. J. et al. Pan-UK Biobank GWAS improves discovery, analysis of genetic architecture, and resolution into ancestry-enriched effects. Preprint at medRxiv &#010;                https:\/\/doi.org\/10.1101\/2024.03.13.24303864&#010;                &#010;               (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR151\" id=\"ref-link-section-d34505251e6547\" rel=\"nofollow noopener\" target=\"_blank\">151<\/a> as new summary statistics for T2D and AD, which were independent from the new target cohorts. We then trained new scPRS models on the basis of original target cohorts. For C+T, LDpred2-grid and Lassosum, model hyperparameters were optimized on the basis of original target cohorts. For scPRS, hyperparameters were selected using fivefold cross-validation of the original target cohorts. All PRS approaches were tested on the basis of new independent target cohorts.<\/p>\n<p>Prioritization of disease-relevant cells and cell types using scPRS<\/p>\n<p>The mapping from input PRS features X to latent cell features \\({h}^{(T)}\\) monotonically increases as a result of the design principle of scPRS, where weights in the embedding and GNN modules are constrained to be nonnegative. This features facilitates model interpretation: a larger value of \\({\\beta }_{m}\\) denotes a higher enrichment of genetic risk within that cell, thereby informing disease\u2013cell relevance. To account for the variability of learned cell weights, we trained 100 scPRS models and compared the distribution of \\({\\beta }_{m}\\) for individual cells with that of top-ranking weights (that is, the top 15% of all cell weights per repeat) using a one-sided t-test. This comparison was conducted for each cell in the dataset. We defined disease-relevant cells as those cells whose adjusted P values (using the Benjamini\u2013Yekutieli procedure) were less than 0.1. Roughly speaking, scPRS prioritizes cells whose weights are consistently larger than those of the majority of cells.<\/p>\n<p>To get more biological insights, we examined the enrichment of scPRS-prioritized cells within each cell type using a Fisher\u2019s exact test. The disease-relevant cell types were defined as those cell types whose adjusted P values (using the BH procedure) were less than 0.1.<\/p>\n<p>Simulation details<\/p>\n<p>Using the PBMC multiome data downloaded from 10x Genomics, we first conducted the differential accessibility analysis to identify monocyte-specific scATAC-seq peaks. In this study, we defined monocytes as the total set of CD14\/CD16 monocytes and dendritic cells considering their shared heritability<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 152\" title=\"Ulirsch, J. C. et al. Interrogation of human hematopoiesis at single-cell and single-variant resolution. Nat. Genet. 51, 683&#x2013;693 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR152\" id=\"ref-link-section-d34505251e6678\" rel=\"nofollow noopener\" target=\"_blank\">152<\/a>. We identified differentially accessible regions (DARs) within monocytes using the top 1,500 marker peaks per cell subtype. Next, leveraging a monocyte count GWAS<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 22\" title=\"Vuckovic, D. et al. The polygenic and monogenic basis of blood traits and diseases. Cell 182, 1214&#x2013;1231 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR22\" id=\"ref-link-section-d34505251e6682\" rel=\"nofollow noopener\" target=\"_blank\">22<\/a>, we computed PLINK C+T PRS conditioned on the variants located within monocyte DARs for a WGS cohort<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 23\" title=\"Li, J. et al. Decoding the genomics of abdominal aortic aneurysm. Cell 174, 1361&#x2013;1372 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR23\" id=\"ref-link-section-d34505251e6686\" rel=\"nofollow noopener\" target=\"_blank\">23<\/a> (n\u2009=\u2009401). Raw C+T PRS outputs were further standardized to mean\u2009=\u20090 and variance\u2009=\u20091, yielding the \u2018ground truth\u2019 of monocyte count for this cohort.<\/p>\n<p>To introduce randomness, we added a noise term to the simulated monocyte count:<\/p>\n<p>$$\\widetilde{y}=y+\\varepsilon,$$<\/p>\n<p>where \\(\\varepsilon \\sim {\\mathscr{N}}\\left(0,{\\sigma }^{2}\\right)\\). In this study, we used \\(\\sigma \\in \\{\\mathrm{0,0.25,0.5,1,3,5,7}\\}\\). We trained scPRS on the basis of these simulation datasets with and without noises to evaluate its capacity in identifying phenotype-associated cells.<\/p>\n<p>SCAVENGE<\/p>\n<p>We used SCAVENGE<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 46\" title=\"Yu, F. et al. Variant to function mapping at single-cell resolution through network propagation. Nat. Biotechnol. 40, 1644&#x2013;1653 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR46\" id=\"ref-link-section-d34505251e6826\" rel=\"nofollow noopener\" target=\"_blank\">46<\/a> as a benchmark for prioritizing disease-relevant cells. Following the SCAVENGE tutorial (<a href=\"https:\/\/sankaranlab.github.io\/SCAVENGE\/articles\/SCAVENGE\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/sankaranlab.github.io\/SCAVENGE\/articles\/SCAVENGE<\/a>), we calculated trait relevance scores (TRSs) for individual cells, indicative of their association with the disease. Cells were prioritized by SCAVENGE if their TRSs were above 95% of all TRSs. As in the scPRS analysis, we evaluated the enrichment of selected cells within each cell type using the Fisher\u2019s exact test.<\/p>\n<p>Stratified LDSC<\/p>\n<p>Partitioned heritability analysis was carried out using sLDSC as previously described<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 45\" title=\"Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228&#x2013;1235 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR45\" id=\"ref-link-section-d34505251e6845\" rel=\"nofollow noopener\" target=\"_blank\">45<\/a>. Heritability was quantified within the total set of snATAC-seq peaks identified for each of the left-ventricle cell types. Genetic enrichment for a particular cell type was defined by calculating the captured heritability per unit of sequence within the total set of identified snATAC-seq peaks for that cell type, compared to the genome overall. P values were calculated as previously described<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 45\" title=\"Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228&#x2013;1235 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR45\" id=\"ref-link-section-d34505251e6852\" rel=\"nofollow noopener\" target=\"_blank\">45<\/a>; nominal significance (P\u2009&lt;\u20090.05) was taken to be indicative of true enrichment.<\/p>\n<p>We conducted sLDSC using the same GWAS and scATAC-seq datasets as those used in scPRS for HCM and severe COVID-19, for which no existing sLDSC results were available. For AD, the original sLDSC<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 35\" title=\"Corces, M. R. et al. Single-cell epigenomic analyses implicate candidate causal variants at inherited risk loci for Alzheimer&#x2019;s and Parkinson&#x2019;s diseases. Nat. Genet. 52, 1158&#x2013;1168 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR35\" id=\"ref-link-section-d34505251e6862\" rel=\"nofollow noopener\" target=\"_blank\">35<\/a> was performed on the same GWAS and scATAC-seq dataset. For T2D, the original sLDSC<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 44\" title=\"Chiou, J. et al. Single-cell chromatin accessibility identifies pancreatic islet cell type- and state-specific regulatory programs of diabetes risk. Nat. Genet. 53, 455&#x2013;466 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR44\" id=\"ref-link-section-d34505251e6866\" rel=\"nofollow noopener\" target=\"_blank\">44<\/a> was carried out on the same scATAC-seq dataset but used a larger GWAS<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 153\" title=\"Mahajan, A. et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat Genet 50, 1505&#x2013;1513 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR153\" id=\"ref-link-section-d34505251e6870\" rel=\"nofollow noopener\" target=\"_blank\">153<\/a>. We chose to report the results of sLDSC applied to discovery GWAS to optimize its power, given the larger sample size of discovery GWAS compared to target cohort.<\/p>\n<p>Identification of disease-relevant cCREs<\/p>\n<p>As the first step of the layered multiomic analysis (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#Fig5\" rel=\"nofollow noopener\" target=\"_blank\">5a<\/a>), we identified differentially accessible cCREs within each scPRS-prioritized cell type using Signac<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 154\" title=\"Stuart, T., Srivastava, A., Madad, S., Lareau, C. A. &amp; Satija, R. Single-cell chromatin state analysis with Signac. Nat. Methods 18, 1333&#x2013;1341 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR154\" id=\"ref-link-section-d34505251e6885\" rel=\"nofollow noopener\" target=\"_blank\">154<\/a>. Specifically, we used the FindMarker function to compare peaks within scPRS-prioritized cells (per cell type) against all unselected cells in the dataset as background, with parameters test.use\u2009=\u2009\u2018LR\u2019, latent.vars\u2009=\u2009\u2019peak_region_fragments\u2019, min.pct\u2009=\u20090.02,and logfc.threshold\u2009=\u20090.1. Significant peaks (adjusted P\u2009&lt;\u20090.1 based on BH correction) with a positive log2 FC were defined as differentially accessible cCREs. Next, leveraging the discovery GWAS summary statistics, we conducted MAGMA<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 74\" title=\"de Leeuw, C. A., Mooij, J. M. &amp; Heskes, T. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR74\" id=\"ref-link-section-d34505251e6894\" rel=\"nofollow noopener\" target=\"_blank\">74<\/a> analysis for these differentially accessible cCREs per cell type, with gene-model\u2009=\u2009\u2018multi\u2019. MAGMA is a widely used tool for gene-level and region-level genetic association analysis based on GWAS summary data. It is designed to test genetic associations of predefined genes or regions with diseases or traits by aggregating variant-level GWAS statistics while accounting for LD. We defined disease-relevant cCREs (T2D-cCREs and AD-cCREs) as those cCREs with adjusted MAGMA P\u2009&lt;\u20090.1 based on BH correction. We expanded our analysis to involve all nominally significant cCREs (MAGMA P\u2009&lt;\u20090.05) for HCM, as no cCRE passed the multiple-testing correction.<\/p>\n<p>Mapping cCRE\u2013gene links<\/p>\n<p>We mapped cCREs to their target genes on the basis of two complementary strategies. First, we adopted the closest-gene strategy<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 155\" title=\"Fulco, C. P. et al. Activity-by-contact model of enhancer&#x2013;promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664&#x2013;1669 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR155\" id=\"ref-link-section-d34505251e6913\" rel=\"nofollow noopener\" target=\"_blank\">155<\/a> and assigned each cCRE to its closest gene. In addition, we added more distant genes on the basis of a coaccessibility analysis using Cicero<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 75\" title=\"Pliner, H. A. et al. Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Mol. Cell 71, 858&#x2013;871 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR75\" id=\"ref-link-section-d34505251e6917\" rel=\"nofollow noopener\" target=\"_blank\">75<\/a> and linked each cCRE to those genes whose TSS peak displayed coaccessibility with the cCRE above 80% of all interactions. For each scPRS-prioritized cell type, the expressed genes mapped to disease-relevant cCREs within that cell type defined the repertoire of disease candidate genes.<\/p>\n<p>Enrichment of disease-associated variants within scPRS-cell-specific peaks<\/p>\n<p>Per disease-relevant cell type, we performed clumping within differentially accessible peaks in scPRS-prioritized cells to remove redundant variants. Multiple LD r2 thresholds (r2\u2009=\u20090.1, 0.3 and 0.5) were tested. Leveraging the clumped variant set, we examined the enrichment of disease-associated variants (GWAS P\u2009&lt;\u20095\u2009\u00d7\u200910\u22128) within scPRS-cell-specific peaks by comparing it to the genome-wide distribution.<\/p>\n<p>TF-binding motif analysis<\/p>\n<p>The TF-binding motif analysis was performed using GimmeMotifs<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 156\" title=\"van Heeringen, S. J. &amp; Veenstra, G. J. C. GimmeMotifs: a de novo motif prediction pipeline for ChIP-sequencing experiments. Bioinformatics 27, 270&#x2013;271 (2011).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR156\" id=\"ref-link-section-d34505251e6951\" rel=\"nofollow noopener\" target=\"_blank\">156<\/a>. The differential motifs between disease-relevant cCREs and all peaks within the corresponding cell type were identified using the \u2018gimme motif\u2019 command with options f\u2009=\u20090.5 and s\u2009=\u20090. AUROC was adopted to quantify the motif enrichment.<\/p>\n<p>Network analysis<\/p>\n<p>We downloaded the human PPIs from STRING (version 12.0)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 98\" title=\"Szklarczyk, D. et al. The STRING database in 2023: protein&#x2013;protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 51, D638&#x2013;D646 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR98\" id=\"ref-link-section-d34505251e6963\" rel=\"nofollow noopener\" target=\"_blank\">98<\/a>, comprising 19,622 proteins and 6,857,702 interactions. High-confidence PPIs (combined score\u2009&gt;\u2009700) were extracted for downstream analysis, including 16,185 proteins and 236,000 interactions. To mitigate bias from hub proteins<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 157\" title=\"Krishnan, A. et al. Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat. Neurosci. 19, 1454&#x2013;1462 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR157\" id=\"ref-link-section-d34505251e6967\" rel=\"nofollow noopener\" target=\"_blank\">157<\/a>, we applied the random walk with restart algorithm with a restart probability of 0.5. This produced a smoothed network after retaining the top 5% predicted edges (n\u2009=\u20096,243,766). Next, we used the Louvain method<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 158\" title=\"Blondel, V. D., Guillaume, J.-L., Lambiotte, R. &amp; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR158\" id=\"ref-link-section-d34505251e6974\" rel=\"nofollow noopener\" target=\"_blank\">158<\/a> to decompose the network into different modules. Following algorithm convergence, we obtained 1,261 modules with an average size of 13 nodes.<\/p>\n<p>The enrichment of genes of interest within each module was tested using the hypergeometric test. Modules with adjusted P\u2009&lt;\u20090.1 based on BH correction were considered significant.<\/p>\n<p>Sequence deep learning model design and training<\/p>\n<p>The sequence-based deep learning model was trained to predict ATAC-seq peaks across various cell types on the basis of the DNA sequence. Specifically, the sequence model takes a 2,000-bp DNA sequence as the input and outputs the peak status of the centered 200\u2009bp for different cell types. The peak label for a specific cell type is 1 if over 50% of the centered 200\u2009bp is overlapped by an ATAC-seq peak within that cell type and 0 otherwise. Model structure follows the Beluga architecture<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 78\" title=\"Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171&#x2013;1179 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR78\" id=\"ref-link-section-d34505251e6992\" rel=\"nofollow noopener\" target=\"_blank\">78<\/a>, except its outputs correspond to different cell types within the tissue of interest.<\/p>\n<p>ATAC-seq peaks within chromosomes 6 and 7 and chromosomes 8 and 9 were held out as validation and test data, respectively. Peaks in other chromosomes were used as training data. Genomic regions annotated by the ENCODE blacklist<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 159\" title=\"Amemiya, H. M., Kundaje, A. &amp; Boyle, A. P. The ENCODE blacklist: identification of problematic regions of the genome. Sci. Rep. 9, 9354 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR159\" id=\"ref-link-section-d34505251e6999\" rel=\"nofollow noopener\" target=\"_blank\">159<\/a> were excluded from analysis. We adopted the BCE loss as the objective function. The sequence model was trained using the stochastic gradient descent algorithm with a weight decay coefficient of 1\u2009\u00d7\u200910\u22126, momentum of 0.9, learning rate of 0.08 and batch size of 64. The model was implemented using Selene<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 79\" title=\"Chen, K. M., Cofer, E. M., Zhou, J. &amp; Troyanskaya, O. G. Selene: a PyTorch-based deep learning library for sequence data. Nat. Methods 16, 315&#x2013;318 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR79\" id=\"ref-link-section-d34505251e7005\" rel=\"nofollow noopener\" target=\"_blank\">79<\/a>, a PyTorch-based library for sequence deep learning modeling. In this study, we trained separate sequence models using different scATAC-seq datasets.<\/p>\n<p>Prediction of variant effects using sequence deep learning model<\/p>\n<p>We used the sequence model to predict the impact of genetic variants on cCREs across diverse cell types. For a given cell type c and variant v (from reference allele to alternative allele), the model predicts the status of cCRE yref,c and yalt,c for sequences centered on the reference and alternative alleles, respectively. We define the functional effect of variant v in cell type c as yv,c\u2009=\u2009yalt,c\u2009\u2212\u2009yref,c, representing how the variant alters cCRE in this cell type. To achieve a global evaluation of functional scores, we introduce the Zv,c score, which normalizes yv,c as Zv,c\u2009=\u2009(yv,c\u2009\u2212\u2009\u03bc)\/\u03c3, where \u03bc and \u03c3 denote the mean and s.d. of all variant scores, respectively. The Qv,c score is further defined as the quantile of |Zv,c| among all variants. A higher Q score indicates a larger functional effect within a specific cell type.<\/p>\n<p>Benchmarking sequence model prediction<\/p>\n<p>To benchmark the sequence model prediction on variant effects against QTL analysis (eQTL or caQTL), we compared the absolute Z scores computed by the sequence model between QTLs and non-QTLs using a two-sided t-test. The t statistics was used to measure the enrichment of functional variants defined by the sequence model within QTLs.<\/p>\n<p>As the second benchmarking, we used SNP2TFBS<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 89\" title=\"Kumar, S., Ambrosini, G. &amp; Bucher, P. SNP2TFBS&#x2014;a database of regulatory SNPs affecting predicted transcription factor binding site affinity. Nucleic Acids Res. 45, D139&#x2013;D144 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR89\" id=\"ref-link-section-d34505251e7159\" rel=\"nofollow noopener\" target=\"_blank\">89<\/a> to predict the effects of variants on altering TFBS affinity. The binding affinities for different TFs were averaged for each studied variant to estimate its overall effect. Given a particular quantile cutoff, variants were split into two groups according to their Q scores. We then compared the averaged SNP2TFBS scores between these two groups of variants using a two-sided t-test. We report the t statistic, which indicates the enrichment of TFBS-disrupting variants within sequence-model-defined functional variants.<\/p>\n<p>Variant effect within disease-relevant cCREs<\/p>\n<p>We compared the abundance of functional disease-associated variants (GWAS P\u2009&lt;\u20090.05) within disease-relevant cCREs against the background using a Fisher\u2019s exact test. Similarly, the functional variants were defined as those with Q scores above a given cutoff (multiple cutoffs applied). The odds ratio (OR) was adopted to measure the enrichment of functional variants within disease cCREs.<\/p>\n<p>Fine-mapping disease risk variants<\/p>\n<p>We used three approaches to fine-map disease risk variants: the sequence deep learning model, QTL and TFBS. A 0.8 quantile cutoff was adopted to define functional variants on the basis of the sequence model in fine-mapping. In addition to SNP2TFBS, motifbreakR<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 91\" title=\"Coetzee, S. G., Coetzee, G. A. &amp; Hazelett, D. J. motifbreakR: an R\/Bioconductor package for predicting variant effects at transcription factor binding sites. Bioinformatics 31, 3847&#x2013;3849 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR91\" id=\"ref-link-section-d34505251e7195\" rel=\"nofollow noopener\" target=\"_blank\">91<\/a> was used to predict variant disruption on TF binding. A positive averaged SNP2TFBS score or a strong-effect motifbreakR score was used to define a disrupting variant. We excluded missense and loss-of-function variants and variants with GWAS P\u2009\u2265\u20090.05 from fine-mapping.<\/p>\n<p>iPS cell reprogramming<\/p>\n<p>iPS cells were reprogrammed from PBMCs using Sendai virus (CytoTune iPS 2.0 Sendai Reprogramming Kit) as previously described<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 160\" title=\"Gruber, J. J. et al. Chromatin remodeling in response to BRCA2-Crisis. Cell Rep. 28, 2182&#x2013;2193 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR160\" id=\"ref-link-section-d34505251e7210\" rel=\"nofollow noopener\" target=\"_blank\">160<\/a>. Three clones were generated per subject, karyotyped (KaryoStat, Thermo Fisher Scientific), determined to be free of Mycoplasma and evaluated by immunohistochemistry for expression of pluripotency markers TRA-1-60 (LifeTech, MA1023) and SSEA4 (LifeTech, MA1021). Cells were maintained under feed-free conditions in mTeSR (StemCell Technologies, 5850) or Essential 8 medium (Fisher, A1517001) and stored in liquid nitrogen.<\/p>\n<p>CDM differentiation and drug treatment<\/p>\n<p>As previously described<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 161\" title=\"Burridge, P. W. et al. Chemically defined generation of human cardiomyocytes. Nat. Methods 11, 855&#x2013;860 (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR161\" id=\"ref-link-section-d34505251e7225\" rel=\"nofollow noopener\" target=\"_blank\">161<\/a>, iPS cells were plated on Matrigel and cultured in StemMACS iPS-Brew XF (MACS Miltenyi Biotec, 130-104-368) until the final passage in Essential 8 medium (Fisher, A1517001). CDM differentiation was induced at 60\u201380% confluency, with culture in RPMI medium (Gibco\/LifeTech, 11875-119) plus B27 supplement lacking insulin (Gibco\/LifeTech, A1895601). Then, 6\u2009\u00b5M CHIR-99021 (Fisher, NC0976209) was added on day 0 and 6\u2009\u00b5M IWR1 (Fisher, NC1319406) was added on day 3. Beginning on day 7, the medium was changed every other day using RPMI medium supplemented with B27 containing insulin (Gibco\/LifeTech 17504-044). Upon commencement of beating (around day 15), cells underwent purification by a 3-day glucose starvation (RPMI medium without glucose (Gibco\/LifeTech, 11879-020) supplemented with insulin-containing B27), a 1-day recovery in glucose-containing medium and subsequent replating (dissociated in TrypLE, Fisher, 50-591-353). Cells were then maintained in RPMI medium supplemented with insulin-containing B27 until approximately day 30. After differentiation, drug treatment occurred at 0 and 24\u2009h and samples were assayed at 48\u2009h. Cells were treated with 250\u2009nM MYK-461 (Cayman Chemical, 19216-5mg), 400\u2009nM or 1\u2009\u03bcM omecamtiv mecarbil (Selleckchem, Fisher, NC1069600) or DMSO.<\/p>\n<p>RNA-seq library preparation, sequencing, QC and expression matrix generation<\/p>\n<p>RNA was extracted from iPS cells or CDMs (RNeasy, Qiagen). Illumina RNA-seq libraries (TruSeq Stranded Total RNA LP Gold) were prepared on the Bravo (Agilent), pooled and sequenced (NovaSeq 6000, paired-end, 100\u2009bp)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 27\" title=\"Monte, E. et al. Personalized transcriptome signatures in a cardiomyopathy stem cell biobank. Preprint at bioRxiv &#010;                https:\/\/doi.org\/10.1101\/2024.05.10.593618&#010;                &#010;               (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR27\" id=\"ref-link-section-d34505251e7237\" rel=\"nofollow noopener\" target=\"_blank\">27<\/a>. Where possible, drug treatment conditions for the same differentiation were kept together in batches, while replicate differentiations for the same iPS cell lines were split apart and HCM and control samples were distributed across batches. Reads were aligned to hg38 (STAR). PC analysis on CDM and iPS samples separately returned no outlier samples (defined as Z score of PC1\u2009&gt;\u20093). Library QC was assessed using fastp, fastQC, STAR and Picard metrics. Samples were flagged for poor QC by the following metrics: G+C content after filtering outside of 20\u201380% (fastp), duplication rate greater than 40% (fastp), uniquely mapped read pairs (fragments)\u2009&lt;\u200920 million (STAR), mean reads (average of forward and reverse)\u2009&lt;\u200920 million (fastQC), ribosomal RNA bases\u2009&gt;\u200920% (Picard), coding plus UTR (untranslated region)\u2009&lt;\u200950% (Picard) and uniquely mapping fragments\u2009&lt;\u200960% (STAR). Samples with more than one flag were removed. CDM and iPS cell samples were subsequently processed separately. Reads were computed as counts per million (edgeR), corrected for library preparation batch (combat-seq) and normalized by the trimmed mean of M values (TMM; edgeR) to generate the final expression matrix. For samples with biological replicates, TMM counts were averaged. PC analysis was performed and PC1 was assessed for Spearman correlation with the following metadata: percent G+C content (fastp), mean reads (average of forward and reverse) in millions (fastQC), percent ribosomal RNA bases (Picard), uniquely mapped fragments in millions (STAR), duplication rate (fastp), percent coding or UTR (Picard), library preparation batch and sequencing pool. The maximum absolute value for spearman correlation between PC1 and the library metadata was 0.08 for CDM samples, indicating good QC with technical artifacts having minimal influence on the dataset. iPS cell samples had higher correlation for three metrics (0.26 with G+C content, 0.22 with duplication rate and 0.11 with percent coding or UTR), with the remaining having less than an absolute value of 0.04.<\/p>\n<p>Differential expression analysis<\/p>\n<p>Raw data were input into DESeq2 (ref. <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 162\" title=\"Love, M. I., Huber, W. &amp; Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR162\" id=\"ref-link-section-d34505251e7255\" rel=\"nofollow noopener\" target=\"_blank\">162<\/a>) as required to compare gene expression between HCM cases and controls across different conditions. Gene counts were averaged across replicates. Sample sex and ancestry were included as covariates in the analysis.<\/p>\n<p>Allelic imbalance analysis in rs7922621 prime-edited microglia<\/p>\n<p>The <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/snp\/?term=rs7922621\" rel=\"nofollow noopener\" target=\"_blank\">rs7922621<\/a> prime-edited WTC11 clones were obtained from our previous study<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 108\" title=\"Yang, X. et al. Functional characterization of Alzheimer&#x2019;s disease genetic variants in microglia. Nat. Genet. 55, 1735&#x2013;1744 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR108\" id=\"ref-link-section-d34505251e7275\" rel=\"nofollow noopener\" target=\"_blank\">108<\/a> and microglia were differentiated accordingly. Total RNA was isolated from wild-type and prime-edited microglia using the RNeasy plus mini kit (Qiagen, 74034). Briefly, 400\u2009ng of total RNA was reverse-transcribed using the iScript complementary DNA (cDNA) synthesis kit (Bio-Rad, 1708891). The cDNA region containing phased heterozygous SNP of ANXA11 (<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/snp\/?term=rs2573353\" rel=\"nofollow noopener\" target=\"_blank\">rs2573353<\/a> in WTC11)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 163\" title=\"Song, M. et al. Mapping cis-regulatory chromatin contacts in neural cells links neuropsychiatric disorder risk variants to target genes. Nat. Genet. 51, 1252&#x2013;1262 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR163\" id=\"ref-link-section-d34505251e7289\" rel=\"nofollow noopener\" target=\"_blank\">163<\/a> was amplified using the following primers: WTC-ANX-F, AGGTCCAATAATCCCTGCTGA; WTC-ANX-R, CCATGGTGCTCGGCTAATTT. The PCR products were purified by agarose gel extraction, followed by the addition of Illumina adaptors and deep sequencing. Reads were aligned to the sequence of either allele and counted if the 100-bp regions surrounding <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/snp\/?term=rs2573353\" rel=\"nofollow noopener\" target=\"_blank\">rs2573353<\/a> were exactly matched.<\/p>\n<p>Differentiation of TMEM119\u2013Tdtomato reporter cell line iMGs<\/p>\n<p>iPS cells stably expressing a TMEM119\u2013tdTomato reporter transgene were first differentiated into fibroblast-like cells using a previously established method<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 110\" title=\"Shi, Y. et al. Identification and therapeutic rescue of autophagosome and glutamate receptor defects in C9ORF72 and sporadic ALS neurons. JCI Insight 5, e127736 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR110\" id=\"ref-link-section-d34505251e7309\" rel=\"nofollow noopener\" target=\"_blank\">110<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 111\" title=\"Shi, Y. et al. Haploinsufficiency leads to neurodegeneration in C9ORF72 ALS\/FTD human induced motor neurons. Nat. Med. 24, 313&#x2013;325 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#ref-CR111\" id=\"ref-link-section-d34505251e7312\" rel=\"nofollow noopener\" target=\"_blank\">111<\/a>. TMEM119\u2013tdTomato fibroblasts were seeded onto 96-well plates (Corning) coated with 0.1% gelatin and Matrigel in fibroblast medium (DMEM with 10% FBS and 1% penicillin\u2013streptomycin). After 48\u2009h, the cells were transduced with 200\u2009\u03bcl of two different concentrated retroviruses to overexpress the human PU.1 and CEBPA per 96-well well with 5\u2009\u03bcg\u2009ml\u22121 polybrene in fibroblast medium. Then, 24\u2009h after transduction, the medium was switched to DMEM with 5% FBS, 10 ng\u2009ml\u22121 human macrophage colony-stimulating factor (M-CSF) and 10\u2009ng\u2009ml\u22121 interleukin 34 (IL-34) and refreshed every 3\u2009days thereafter. iMGs expressing the TMEM119\u2013tdTomato reporter were used for experiments 14\u2009days after viral transduction.<\/p>\n<p>siRNA transfection<\/p>\n<p>siRNAs (Thermo Fisher) at a concentration of 30\u2009nM were transfected into iMGs on day 14 using Lipofectamine RNAiMAX transfection reagent (Thermo Fisher Scientific, 13778075) in complete iMG medium (DMEM\u2009+\u20095% FBS, 10\u2009ng\u2009ml\u22121 M-CSF and 10\u2009ng\u2009ml\u22121 IL-34). After 24\u2009h, the medium was refreshed with complete iMG medium; after an additional 24\u2009h (48\u2009h after transfection), cell cultures were collected for RT\u2013qPCR or pHrodo analysis.<\/p>\n<p>pHrodo phagocytosis assay<\/p>\n<p>iMGs cultured in 96-well plates (Corning) coated with Matrigel and gelatin were incubated with 10\u2009\u03bcg of pHrodo green Escherichia coli bioparticles (Inucyte) for 15\u2009min at 37\u2009\u00b0C. Wells were then washed with PBS and were longitudinally imaged with Molecular Devices ImageExpress at 30-min intervals for the initial 2\u2009h and 1-h intervals thereafter up to 24\u2009h after the start. The 2-h time point was selected for downstream analysis. ImageJ software was used for quantification of individual replicates across conditions on the basis of the colocalization of TMEM119\u2013Tdtomato and pHrodo green.<\/p>\n<p>Reporting summary<\/p>\n<p>Further information on research design is available in the <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41587-025-02725-6#MOESM2\" rel=\"nofollow noopener\" target=\"_blank\">Nature Portfolio Reporting Summary<\/a> linked to this article.<\/p>\n","protected":false},"excerpt":{"rendered":"Single-cell multiome dataset Single-cell multiome (snRNA-seq\u2009+\u2009snATAC-seq) data of the human left ventricle and lung were processed and clustered&hellip;\n","protected":false},"author":2,"featured_media":21582,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[25],"tags":[6634,64,63,15373,21994,5562,4335,21993,16446,1325,20630,336,8054,9865,2565,128],"class_list":{"0":"post-21581","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-genetics","8":"tag-agriculture","9":"tag-au","10":"tag-australia","11":"tag-bioinformatics","12":"tag-biomedical-engineering-biotechnology","13":"tag-biomedicine","14":"tag-biotechnology","15":"tag-diseases","16":"tag-gene-regulation","17":"tag-general","18":"tag-genetic-association-study","19":"tag-genetics","20":"tag-genomics","21":"tag-life-sciences","22":"tag-machine-learning","23":"tag-science"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/21581","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/comments?post=21581"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/posts\/21581\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media\/21582"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/media?parent=21581"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/categories?post=21581"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/au\/wp-json\/wp\/v2\/tags?post=21581"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}