{"id":113416,"date":"2025-08-27T09:26:08","date_gmt":"2025-08-27T09:26:08","guid":{"rendered":"https:\/\/www.newsbeep.com\/us\/113416\/"},"modified":"2025-08-27T09:26:08","modified_gmt":"2025-08-27T09:26:08","slug":"cross-biobank-generalizability-and-accuracy-of-electronic-health-record-based-predictors-compared-to-polygenic-scores","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/us\/113416\/","title":{"rendered":"Cross-biobank generalizability and accuracy of electronic health record-based predictors compared to polygenic scores"},"content":{"rendered":"<p>Ethics declarations<\/p>\n<p>Patients and control participants in FinnGen provided informed consent for biobank research, based on the Finnish Biobank Act. Alternatively, separate research cohorts, collected before the Finnish Biobank Act came into effect (in September 2013) and the start of FinnGen (August 2017), were collected based on study-specific consents and later transferred to the Finnish Biobanks after approval by Fimea (Finnish Medicines Agency), the National Supervisory Authority for Welfare and Health. Recruitment protocols followed the biobank protocols approved by Fimea. The Coordinating Ethics Committee of the Hospital District of Helsinki and Uusimaa (HUS) approved the FinnGen study under ethics statement HUS\/990\/2017. The study is also approved by the Finnish Institute for Health and Welfare (permits THL\/2031\/6.02.00\/2017, THL\/1101\/5.05.00\/2017, THL\/341\/6.02.00\/2018, THL\/2222\/6.02.00\/2018, THL\/283\/6.02.00\/2019, THL\/1721\/5.05.00\/2019 and THL\/1524\/5.05.00\/2020); the Digital and Population Data Services Agency (permits VRK43431\/2017-3, VRK\/6909\/2018-3 and VRK\/4415\/2019-3); the Social Insurance Institution of Finland (permits KELA 58\/522\/2017, KELA 131\/522\/2018, KELA 70\/522\/2019, KELA 98\/522\/2019, KELA 134\/522\/2019, KELA 138\/522\/2019, KELA 2\/522\/2020 and KELA 16\/522\/2020); Findata (permits THL\/2364\/14.02\/2020, THL\/4055\/14.06.00\/2020, THL\/3433\/14.06.00\/2020, THL\/4432\/14.06\/2020, THL\/5189\/14.06\/2020, THL\/5894\/14.06.00\/2020, THL\/6619\/14.06.00\/2020, THL\/209\/14.06.00\/2021, THL\/688\/14.06.00\/2021, THL\/1284\/14.06.00\/2021, THL\/1965\/14.06.00\/2021, THL\/5546\/14.02.00\/2020, THL\/2658\/14.06.00\/2021 and THL\/4235\/14.06.00\/2021); Statistics Finland (permits TK-53-1041-17, TK\/143\/07.03.00\/2020 (earlier TK-53-90-20), TK\/1735\/07.03.00\/2021 and TK\/3112\/07.03.00\/2021) and the Finnish Registry for Kidney Diseases (permission based on the meeting minutes dated 4 July 2019). The biobank access decisions for FinnGen samples and data used in FinnGen Data Freeze 10 include approvals from the following biobanks: THL Biobank (BB2017_55, BB2017_111, BB2018_19, BB_2018_34, BB_2018_67, BB2018_71, BB2019_7, BB2019_8, BB2019_26, BB2020_1 and BB2021_65); Finnish Red Cross Blood Service Biobank (7 December 2017); Helsinki Biobank (HUS\/359\/2017, HUS\/248\/2020, HUS\/150\/2022 \u00a7\u00a712\u201318 and \u00a723); Auria Biobank (AB17-5154 and amendment 1 (17 August 2020), amendments BB_2021-0140, BB_2021-0156 (26 August 2021, 2 February 2022), BB_2021-0169, BB_2021-0179, BB_2021-0161, AB20-5926 and amendment 1 (23 April 2020) with its modification (22 September 2021)); Biobank Borealis of Northern Finland (2017_1013, 2021_5010, 2021_5018, 2021_5015, 2021_5023, 2021_5017 and 2022_6001); Biobank of Eastern Finland (1186\/2018 and amendments \u00a7\u00a722\/2020, 53\/2021, 13\/2022, 14\/2022 and 15\/2022); Finnish Clinical Biobank Tampere (MH0004 and amendments (21 February 2020 and 6 October 2020), \u00a7\u00a78\/2021, 9\/2022, 10\/2022, 12\/2022, 20\/2022, 21\/2022, 22\/2022 and 23\/2022); Central Finland Biobank (1-2017); Terveystalo Biobank (STB 2018001 and amendment dated 25 August 2020); Finnish Hematological Registry and Clinical Biobank (decision dated 18 June 2021) and Arctic Biobank (P0844: ARC_2021_1001).<\/p>\n<p>Ethics approval for the UK Biobank study was obtained from the North West Centre for Research Ethics Committee (11\/NW\/0382). The UK Biobank data used in this study were obtained under approved application 78537.<\/p>\n<p>The activities of the EstBB are regulated by the Human Genes Research Act, which was adopted in 2000 specifically for the operations of the EstBB. Individual-level data analysis in the EstBB was carried out under ethical approval 1.1-12\/624 from the Estonian Committee on Bioethics and Human Research (Estonian Ministry of Social Affairs), using data according to release application S22, document 6-7\/GI\/16259 from the EstBB.<\/p>\n<p>Study setup<\/p>\n<p>As outlined in Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#Fig1\" rel=\"nofollow noopener\" target=\"_blank\">1b<\/a>, each study consisted of a 10-year observation (6 years for EstBB due to shorter follow-up) and an 8-year prediction period, separated by a 2-year washout period. Each disease\u2019s case and control definitions were based on diagnoses acquired in the 8-year prediction period (from 1 January 2011 to after 1 January 2019). The International Classification of Diseases (ICD) codes used to define the cases for each disease were based on previous harmonization between FinnGen and the EstBB phenotypes by the INTERVENE consortium<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 34\" title=\"Jermy, B. et al. A unified framework for estimating country-specific cumulative incidence for 18 diseases stratified by polygenic risk. Nat. Commun. 15, 5007 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#ref-CR34\" id=\"ref-link-section-d112318285e1504\" rel=\"nofollow noopener\" target=\"_blank\">34<\/a> (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#MOESM3\" rel=\"nofollow noopener\" target=\"_blank\">14<\/a>). We consider all individuals as controls who were not cases. We only considered adults aged 32\u201370 in 1 January 2011 and removed all individuals diagnosed with the disease before this time. The lower limit for age of inclusion was chosen due to the inclusion of education level in some of the models and was determined based on the median age of obtaining a doctoral degree in the FinnGen dataset. Using this lower limit, most individuals included have finished their highest level of education. Furthermore, we removed all individuals with a diagnosis outside the prediction period (from 1 January 2011 to after 1 January 2019) and those lost to follow-up before the start of the prediction period. The ICD-codes used to define the cases for each disease and the number of cases and controls in each study are listed in Supplementary Tables <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#MOESM3\" rel=\"nofollow noopener\" target=\"_blank\">2<\/a> and <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#MOESM3\" rel=\"nofollow noopener\" target=\"_blank\">14<\/a>.<\/p>\n<p>We included 845,929 individuals (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#MOESM3\" rel=\"nofollow noopener\" target=\"_blank\">1<\/a>) from three biobank-based studies\u2014FinnGen<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 29\" title=\"Kurki, M. I. et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 613, 508&#x2013;518 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#ref-CR29\" id=\"ref-link-section-d112318285e1523\" rel=\"nofollow noopener\" target=\"_blank\">29<\/a>, UKB<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 28\" title=\"Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203&#x2013;209 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#ref-CR28\" id=\"ref-link-section-d112318285e1527\" rel=\"nofollow noopener\" target=\"_blank\">28<\/a> and EstB<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 30\" title=\"Leitsalu, L. et al. Cohort profile: Estonian Biobank of the Estonian Genome Center, University of Tartu. Int. J. Epidemiol. 44, 1137&#x2013;1147 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#ref-CR30\" id=\"ref-link-section-d112318285e1531\" rel=\"nofollow noopener\" target=\"_blank\">30<\/a> linked with national registers or EHRs. In FinnGen, we used Data Freeze 10, which includes 412,090 individuals, of whom 266,179 were aged 32\u201370 years in 1 January 2011. The longitudinal ICD-code diagnoses used to define the phecodes and the case and control status for each disease were based on in- and outpatient hospital register information. The UKB study included 464,076 individuals aged 40\u201370 years, with the ICD-code diagnoses based on inpatient information. The EstB study included 199,868 individuals, of whom 115,674 were aged 32\u201370 years. Here we also had primary care data as well as self-reported diagnoses available. More details on the phenotype harmonization can be found in ref. <a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 34\" title=\"Jermy, B. et al. A unified framework for estimating country-specific cumulative incidence for 18 diseases stratified by polygenic risk. Nat. Commun. 15, 5007 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#ref-CR34\" id=\"ref-link-section-d112318285e1535\" rel=\"nofollow noopener\" target=\"_blank\">34<\/a> and the <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">Supplementary Methods<\/a>.<\/p>\n<p>PredictorsPGS<\/p>\n<p>The PGS were previously computed by the INTERVENE consortium<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 34\" title=\"Jermy, B. et al. A unified framework for estimating country-specific cumulative incidence for 18 diseases stratified by polygenic risk. Nat. Commun. 15, 5007 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#ref-CR34\" id=\"ref-link-section-d112318285e1555\" rel=\"nofollow noopener\" target=\"_blank\">34<\/a> and based on the recent publicly available GWAS summary statistics, with minimal overlap with our study cohorts (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#MOESM3\" rel=\"nofollow noopener\" target=\"_blank\">15<\/a>) using MegaPRS<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 51\" title=\"Zhang, Q., Priv&#xE9;, F., Vilhj&#xE1;lmsson, B. &amp; Speed, D. Improved genetic prediction of complex traits from individual-level data or summary statistics. Nat. Commun. 12, 4192 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#ref-CR51\" id=\"ref-link-section-d112318285e1562\" rel=\"nofollow noopener\" target=\"_blank\">51<\/a> with the BLD-LDAK heritability model. For the Cox-PH models, we removed individuals from the studies that were part of the GWAS on which the PGS were based. Due to the large overlap with the UKB individuals, we only had PGS for gout, epilepsy, breast and prostate cancer available in the UKB.<\/p>\n<p>PheRS<\/p>\n<p>For the EHR-based models, we trained elastic net models<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 32\" title=\"Lebovitch, D. S., Johnson, J. S., Due&#xF1;as, H. R. &amp; Huckins, L. M. Phenotype risk scores: moving beyond &#x2018;cases&#x2019; and &#x2018;controls&#x2019; to classify psychiatric disease in hospital-based biobanks. Preprint at medRxiv &#010;                https:\/\/doi.org\/10.1101\/2021.01.25.21249615&#010;                &#010;               (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#ref-CR32\" id=\"ref-link-section-d112318285e1574\" rel=\"nofollow noopener\" target=\"_blank\">32<\/a> on ICD-9 and ICD-10 diagnoses mapped to phecodes. The phecode mapping was based on v1.2b1 of the phecode map<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 33\" title=\"Wu, P. et al. Mapping ICD-10 and ICD-10-CM codes to phecodes: workflow development and initial evaluation. JMIR Med. Inform. 7, e14325 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#ref-CR33\" id=\"ref-link-section-d112318285e1578\" rel=\"nofollow noopener\" target=\"_blank\">33<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 35\" title=\"Denny, J. C. et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 26, 1205&#x2013;1210 (2010).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#ref-CR35\" id=\"ref-link-section-d112318285e1581\" rel=\"nofollow noopener\" target=\"_blank\">35<\/a> from <a href=\"https:\/\/phewascatalog.org\/\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/phewascatalog.org\/<\/a>, with some manual additions. Since we only considered diagnoses during the observation period starting in 1999, all diagnoses were ICD-10-based in our data. To obtain the most comprehensive mapping, we removed all special characters from the ICD codes. If a match could not be found in the phecode map, we shortened the code by one digit until it could be mapped or was removed. The complete mapping used can be found in Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#MOESM3\" rel=\"nofollow noopener\" target=\"_blank\">16<\/a>. We gathered all phecodes in their three-digit parent node in the phecode ontology; for example, type 1 diabetes (250.1), T2D (250.2), and T2D with ketoacidosis (250.21) were all mapped to the same phecode diabetes mellitus (250). For each disease, we separately excluded predictors that were part of the exclusion range of the phecodes (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#MOESM3\" rel=\"nofollow noopener\" target=\"_blank\">3<\/a>), for example, for T2D, we did not use secondary diabetes (phecode 249), diabetes mellitus (250) and conditions complicating pregnancy (649) as predictors. The phecode conditions complicating pregnancy was excluded because it was the parent node of the phecode Diabetes or abnormal glucose tolerance complicating pregnancy (649.1), which is in the exclusion range of the phecode for T2D (250.2). We only considered phecodes with a prevalence of at least 1% of the study population (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#MOESM3\" rel=\"nofollow noopener\" target=\"_blank\">11<\/a>).<\/p>\n<p>We implemented the PheRS using the LogisticRegression function from scikit-learn (version 1.3.2)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 52\" title=\"Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825&#x2013;2830 (2011).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#ref-CR52\" id=\"ref-link-section-d112318285e1605\" rel=\"nofollow noopener\" target=\"_blank\">52<\/a>. We included age (at the start of the prediction period 1 January 2011) and sex as predictors in the PheRS models because they are important predictors, and otherwise the models would reconstruct predictors for age and sex using combinations of the phecode diagnoses, which would make interpretation of the phecode coefficient values challenging. Nonetheless, the effect of age and sex was then regressed out when evaluating the performances of the PheRS (see below). Models were penalized with the elastic net penalty. Predictors were coded as 1\/0, where 1\u2009=\u2009\u2018predictor observed during the observation window\u2019 and 0\u2009=\u2009\u2018predictor not observed during the observation window\u2019, for each disease separately. For training, 50% of the data was used, and this was further divided into training (85%) and hold-out test (15%) sets. Sizes of the training datasets are shown for each disease and study in Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#MOESM3\" rel=\"nofollow noopener\" target=\"_blank\">22<\/a>. L1 to L2 ratio hyperparameter of the elastic net models was optimized using grid search and fivefold cross-validation over the range 0.05\u20130.95 (step size\u2009=\u20090.05), simultaneously with inverse of the regularization strength (C) over the following possible values: 1\u2009\u00d7\u200910\u22125, 5\u2009\u00d7\u200910\u22125, 1\u2009\u00d7\u200910\u22124, 5\u2009\u00d7\u200910\u22124, 1\u2009\u00d7\u200910\u22123, 5\u2009\u00d7\u200910\u22123, 1\u2009\u00d7\u200910\u22122, 5\u2009\u00d7\u200910\u22122, 1\u2009\u00d7\u200910\u22121, 5\u2009\u00d7\u200910\u22121, 1. Balanced class weights were used, based on class frequencies in the training data. The LOO analysis assessing the impact of the removal of individual phecodes to PheRS performance was performed using a ridge penalty instead of elastic net. This was done to cut running time substantially, as using ridge removes the L1 to L2 ratio hyperparameter and its optimization. Otherwise, the ridge models were fitted similarly to the elastic net models. Before running the LOO analysis, we tested that switching to ridge did not generally reduce the PheRS performance in FinnGen (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#Fig12\" rel=\"nofollow noopener\" target=\"_blank\">7a<\/a>).<\/p>\n<p>Model fitting was done using stochastic average gradient descent. The best L1 to L2 ratio was selected based on the average precision score using 5-fold cross-validation on the training split. Missing values of predictors were imputed to the mean of the corresponding predictor in the study-specific training data, and all predictors were standardized to zero mean and unit variance on the study-specific training data before model fitting.<\/p>\n<p>The PheRS models trained within the UKB or the EstB data on 50% of individuals were used to make predictions in FinnGen and UKB test sets, as is without any retraining within the studies. Standardization and imputation were performed based on the biobank-specific training data, meaning that, for example, when assessing the performance of the UKB-trained model in FinnGen, the FinnGen test set data were imputed and standardized based on the feature-specific means and s.d. from the UKB.<\/p>\n<p>Cox-PH models<\/p>\n<p>Ultimately, each individual was assigned 13 different PGS and PheRS scores describing their risk of getting a disease diagnosis in the prediction period based on genetic or EHR-based information. To make the PheRS and PGS comparable, we regressed out the effect of age, sex and the first ten genetic PCs from all continuous scores using the residuals from a logistic regression with the score as outcome. When only considering PheRS performance, we regressed out only age and sex. Subsequently, we scaled all predictors to have a mean of zero and a s.d. of 1. We then used these scores in separate Cox-PH analyses, with survival time defined as the period from 2011 until diagnosis, censoring (end of follow-up), or the end of the prediction period.<\/p>\n<p>Additionally, we considered the CCI (<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">Supplementary Methods<\/a>)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 37\" title=\"Charlson, M. E., Pompei, P., Ales, K. L. &amp; MacKenzie, C. R. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J. Chronic Dis. 40, 373&#x2013;383 (1987).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#ref-CR37\" id=\"ref-link-section-d112318285e1658\" rel=\"nofollow noopener\" target=\"_blank\">37<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 38\" title=\"Deyo, R. A., Cherkin, D. C. &amp; Ciol, M. A. Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. J. Clin. Epidemiol. 45, 613&#x2013;619 (1992).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#ref-CR38\" id=\"ref-link-section-d112318285e1661\" rel=\"nofollow noopener\" target=\"_blank\">38<\/a>\u2014developed to account for the individual\u2019s overall comorbidity burden\u2014and the individual\u2019s highest achieved education level in 2011 as an indicator of their socioeconomic status. For the CCI, we compared the top 10% of individuals with the highest CCI to the rest. The high-risk group included individuals with a CCI\u2009\u2265\u20092 and a few younger ones with a CCI of 1. For the highest education level, we mapped each study\u2019s education coding to the 2011 International Standard Classification of Education (ISCED-11; Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#MOESM3\" rel=\"nofollow noopener\" target=\"_blank\">17<\/a>) codes. We compared the risk of individuals with basic education (ISCED-11: 1\u20134) to those who achieved higher education levels (ISCED-11: 5\u20137).<\/p>\n<p>Statistics<\/p>\n<p>We used the survival<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 53\" title=\"Therneau, T. M. Survival analysis. R package version 3.8-3 &#010;                cran.r-project.org\/web\/packages\/survival\/index.html&#010;                &#010;               (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#ref-CR53\" id=\"ref-link-section-d112318285e1677\" rel=\"nofollow noopener\" target=\"_blank\">53<\/a> package (version 3.2-7) in R for creating the Cox-PH models and the Hmisc<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 54\" title=\"Harrell, F. E. Jr &amp; Dupont, C. Hmisc: Harrell Miscellaneous. R package version 5.2-0 &#010;                hbiostat.org\/r\/hmisc\/&#010;                &#010;               (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#ref-CR54\" id=\"ref-link-section-d112318285e1681\" rel=\"nofollow noopener\" target=\"_blank\">54<\/a> package (version 5.1.0) to calculate the c indices and 95% CIs. For a Cox-PH model with binary outcomes, the predicted survival times can be shown to be equal to the survival probability, so the c index is equivalent to the area under the receiver operating characteristic curve (AUC)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 55\" title=\"Harrell, F. E., Lee, K. L. &amp; Mark, D. B. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med. 15, 361&#x2013;387 (1996).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#ref-CR55\" id=\"ref-link-section-d112318285e1685\" rel=\"nofollow noopener\" target=\"_blank\">55<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 56\" title=\"Pencina, M. J. &amp; D&#x2019;Agostino, R. B. Overall C as a measure of discrimination in survival analysis: model specific population value and confidence interval estimation. Stat. Med. 23, 2109&#x2013;2123 (2004).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#ref-CR56\" id=\"ref-link-section-d112318285e1688\" rel=\"nofollow noopener\" target=\"_blank\">56<\/a>. The meta-analysis of the HRs and c indices was performed using the metafor<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 57\" title=\"Viechtbauer, W. metafor: meta-analysis package for R. R version 4.8-0 &#010;                https:\/\/cran.r-project.org\/web\/packages\/metafor\/index.html&#010;                &#010;               (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#ref-CR57\" id=\"ref-link-section-d112318285e1692\" rel=\"nofollow noopener\" target=\"_blank\">57<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 58\" title=\"Viechtbauer, W. Conducting meta-analyses in R with the metafor package. J. Stat. Softw. 36, 1&#x2013;48 (2010).\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#ref-CR58\" id=\"ref-link-section-d112318285e1695\" rel=\"nofollow noopener\" target=\"_blank\">58<\/a> package (version 4.6-0) in R with a random effects model. We used two-tailed P values, calculated using the pnorm function in the stats package (version 3.6.2) in R, based on the z scores of the \u03b2 differences to compare the differences in HR magnitudes and one-tailed P values for the statistical testing of increases in the c indices. Additionally, we used Bonferroni correction to account for multiple hypothesis testing in each study (n\u2009=\u200913). Correlations were calculated, using the cor.test function from the stats package in R. For regressing out the covariates\u2014age, sex and PCs\u2014from the PheRS and PGS, we used scaled residuals from glm models with the stats package in R.<\/p>\n<p>Comparison of phecode coefficients between different PheRS models<\/p>\n<p>The elastic net hyperparameters were separately optimized for each PheRS model. This means that the absolute magnitudes of the coefficients for phecodes are not comparable between different PheRS. However, the relative importances of phecodes can still be compared, that is, whether, for example, the same phecodes are among the most important predictors in two different PheRS. To make visualization of the phecode importances in different PheRS clearer, we standardized the coefficients of each PheRS separately to a mean of 0 and a s.d. of 1 for the display items. Further, in each study, we ranked the phecodes in descending order by the PheRS coefficient values and assigned them ascending ranks. Thus, a lower rank indicates a higher PheRS coefficient in the model. Both the unscaled PheRS coefficients and ranks are Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#MOESM3\" rel=\"nofollow noopener\" target=\"_blank\">12<\/a>.<\/p>\n<p>Reporting summary<\/p>\n<p>Further information on research design is available in the <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41588-025-02298-9#MOESM2\" rel=\"nofollow noopener\" target=\"_blank\">Nature Portfolio Reporting Summary<\/a> linked to this article.<\/p>\n","protected":false},"excerpt":{"rendered":"Ethics declarations Patients and control participants in FinnGen provided informed consent for biobank research, based on the Finnish&hellip;\n","protected":false},"author":2,"featured_media":113417,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[50],"tags":[2342,13114,258,8869,13113,257,200,3869,3870,73952,79],"class_list":{"0":"post-113416","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-genetics","8":"tag-agriculture","9":"tag-animal-genetics-and-genomics","10":"tag-biomedicine","11":"tag-cancer-research","12":"tag-gene-function","13":"tag-general","14":"tag-genetics","15":"tag-genetics-research","16":"tag-human-genetics","17":"tag-preventive-medicine","18":"tag-science"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/113416","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/comments?post=113416"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/posts\/113416\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media\/113417"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/media?parent=113416"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/categories?post=113416"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us\/wp-json\/wp\/v2\/tags?post=113416"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}