Ethics statement
This study complies with all relevant ethical regulations and was conducted in accordance with the research permits and approvals granted by the respective institutions.
Patients and control participants in FinnGen provided informed written consent for biobank research, based on the Finnish Biobank Act. Alternatively, separate research cohorts, collected prior the Finnish Biobank Act came into effect (in September 2013) and start of FinnGen (August 2017), were collected based on study-specific consents and later transferred to the Finnish biobanks after approval by Fimea (Finnish Medicines Agency), the National Supervisory Authority for Welfare and Health. Recruitment protocols followed the biobank protocols approved by Fimea. The Coordinating Ethics Committee of the Hospital District of Helsinki and Uusimaa (HUS) statement number for the FinnGen study is HUS/990/2017.
The FinnGen study is approved by Finnish Institute for Health and Welfare (permits THL/2031/6.02.00/2017, THL/1101/5.05.00/2017, THL/341/6.02.00/2018, THL/2222/6.02.00/2018, THL/283/6.02.00/2019, THL/1721/5.05.00/2019 and THL/1524/5.05.00/2020), Digital and population data service agency (permits VRK43431/2017-3, VRK/6909/2018-3 and VRK/4415/2019-3), the Social Insurance Institution (KELA; permits KELA 58/522/2017, KELA 131/522/2018, KELA 70/522/2019, KELA 98/522/2019, KELA 134/522/2019, KELA 138/522/2019, KELA 2/522/2020 and KELA 16/522/2020), Findata permits (THL/2364/14.02/2020, THL/4055/14.06.00/2020, THL/3433/14.06.00/2020, THL/4432/14.06/2020, THL/5189/14.06/2020, THL/5894/14.06.00/2020, THL/6619/14.06.00/2020, THL/209/14.06.00/2021, THL/688/14.06.00/2021, THL/1284/14.06.00/2021, THL/1965/14.06.00/2021, THL/5546/14.02.00/2020, THL/2658/14.06.00/2021 and THL/4235/14.06.00/2021), Statistics Finland (permits TK-53-1041-17 and TK/143/07.03.00/2020 (earlier TK-53-90-20), TK/1735/07.03.00/2021 and TK/3112/07.03.00/2021) and Finnish Registry for Kidney Diseases permission/extract from the meeting minutes on 4 July 2019.
The Biobank Access Decisions for FinnGen samples and data utilized in FinnGen Data Freeze 10 include the following: THL Biobank (BB2017_55, BB2017_111, BB2018_19, BB2018_34, BB2018_67, BB2018_71, BB2019_7, BB2019_8, BB2019_26, BB2020_1 and BB2021_65), Finnish Red Cross Blood Service Biobank (7 December 2017), Helsinki Biobank (HUS/359/2017, HUS/248/2020 and HUS/150/2022: §§12, 13, 14, 15, 16, 17, 18 and 23), Auria Biobank (AB17-5154 and amendment 1 (17 August 2020); BB_2021-0140, BB_2021-0156 (26 August 2021 and 2 February 2022), BB_2021-0169, BB_2021-0179 and BB_2021-0161; AB20-5926 and amendment 1 (23 April 2020) and its modification (22 September 2021)), Biobank Borealis of Northern Finland (2017_1013, 2021_5010, 2021_5018, 2021_5015, 2021_5023, 2021_5017 and 2022_6001), Biobank of Eastern Finland (1186/2018 and amendments §§22/2020, 53/2021, 13/2022, 14/2022 and 15/2022), Finnish Clinical Biobank Tampere (MH0004 and amendments (21 February 2020 and 6 October 2020); §§8/2021, 9/2022, 10/2022, 12/2022, 20/2022, 21/2022, 22/2022 and 23/2022), Central Finland Biobank (1-2017), Terveystalo Biobank (STB 2018001 and amendment (25 August 2020)), Finnish Hematological Registry and Clinical Biobank (decision dated 18 June 2021) and Arctic Biobank (P0844: ARC_2021_1001).
Study population
In the current study, we included samples from 425,483 individuals from Finland, sourced from FinnGen Data Freeze 10 (https://www.finngen.fi/en)21. This biobank study includes samples from hospital biobanks, alongside prospective epidemiological and disease-based cohorts. Using the unique national personal identification numbers, the data were interconnected with national registries including hospital discharge records (accessible from 1968), death records (from 1969), cancer registries (from 1953) and drug purchase records (from 1995). Registry information was accessible up to 31 December 2021.
Trial selection
As the currently largest trial emulation effort, the RCT-DUPLICATE project6,26 has been emulating numerous RCTs in US-American insurance claims datasets, the goal of which was to assess the utility of the obtained RWE for regulatory decision-making.
We sought to identify four RCTs that have been previously replicated by RCT-DUPLICATE and were feasible to be successfully emulated in our RWD dataset. By the time of initiation of our project, findings of the first ten trial emulations were published by the RCT-DUPLICATE project26. The evaluation criteria deciding upon the feasibility of a RCT replication included critical aspects of the trial emulation protocol, such as the primary outcomes, eligibility criteria, treatment strategies, allowing for only minor deviations if features were not available in our data source (Supplementary Tables 2–3). The RCT was seen as closely emulated when the emulation of the comparator and outcome were at least moderate, and at least one of them was good, as described in the meta-analysis of RCT-DUPLICATE data48.
Trial emulation design and analysis
Based on RCT-DUPLICATE’s trial emulation efforts, we developed the protocols for the emulations of four trials (Empareg, Tecos, Aristotle and Rocket). Closely following the original trial protocols, we emulated an observational data protocol for each trial, including the eligibility criteria, treatment strategies, assignment procedures, follow-up periods, primary outcomes, causal contrasts and an analysis plan4.
Different sets of eligibility criteria required fulfillment within distinct timeframes before therapy initiation. Flowcharts of cohort formations can be found in Supplementary Tables 5–8 and Supplementary Figs. 8–10.
The treatment strategies included new users of either the drug of interest or the comparator drug, starting from the date the newer drug received marketing authorization in Finland. For the two placebo-controlled trials, Empareg and Tecos, we selected an active comparator as a proxy for placebo regarding cardiovascular effects, similar to RCT-DUPLICATE. This is due to the fact that confounding bias may become especially serious when active user groups are compared to nonuser groups, as nonuser comparator groups considerably differ from actively treated patients in ways that are poorly captured in observational datasets49,50.
As a proxy for placebo, DPP4 inhibitors were chosen for Empareg, and second-generation sulfonylureas were chosen for Tecos, given they are likewise antidiabetic treatments commonly prescribed interchangeably with the treatments of interest and are known not to have any causal effect on cardiovascular outcomes based on current evidence23,51,52,53.
As the assignment procedures in observational studies are never at random, an adjustment for confounding variables is required to satisfy the exchangeability assumption. We selected sets of >25 confounding variables, measured within 6 months before drug initiation, reflecting demographics, comorbidities, comedications and cardiovascular procedures. We adopted 1:1 PS nearest-neighbor matching with a caliper of 0.1 or 0.01 on the PS scale, depending on the initial overlap54,55. PS matching statistics and details on covariate balance for all trial emulations can be found in Supplementary Tables 9–12.
Follow-up started at the first purchase of either of the defined therapeutics and ended with the occurrence of a primary outcome event, death, discontinuation or switch to a comparator or end of registry information, whichever occurs first. The time point of a discontinuation of therapy was calculated based on the number of packages purchased by the patient multiplied by the package size.
The primary outcome for Empareg and Tecos was 3P-MACE, and for Aristotle and Rocket, a composite endpoint of stroke and systemic embolism, adapted from the definition used in the corresponding trials.
In our analysis, we used an ‘on-treatment’ approach, attempting to replicate an intention-to-treat estimate derived from the RCT with particularly high treatment compliance. HR and 95% CI were estimated in PS-matched cohorts using the Cox proportional hazard models. We defined ‘estimate agreement’ as the emulation estimate being within the 95% CI for the RCT estimate.
PGS generation
We computed genome-wide PGS for 20 traits (Supplementary Table 13) using the PGS-continuous shrinkage (CS) priors method, used in the PRS-CS tool (version 14 May 2024; https://github.com/getian107/PRScs)56. The input weights were derived from available summary statistics sourced from external genome-wide association studies (GWAS) data pertaining to the 20 traits. Variants were restricted to those present in the HapMap 3 reference panel57. To ensure comparability, PGS were standardized (mean = 0; s.d. = 1) in the whole FinnGen population. Detailed information regarding the summary statistics can be found in the Supplementary Information.
PGS analysis of cohorts
We investigated genetic differences between the treated and control groups at three different stages of the emulation process and how they change with increased confounder adjustment.
For each PGS, we calculated the SMD between the treated and control groups using logistic regression and determined its significance based on a Bonferroni-corrected P value threshold (2.5 × 10−3).
In the first stage, we looked at a plain observational setting that best reflected the original RCT question. Therefore, as Empareg and Tecos are both placebo-controlled trials, we defined the plain observational setting as initiators of the treatment versus noninitiators. Because Aristotle and Rocket are both active-comparator trials, the plain observational setting was defined as initiators of the treatment versus initiators of the active comparator. In the second stage, we looked at the cohorts after applying the eligibility criteria. In the third, we considered the PS-matched cohorts.
Simulations
To show that correcting on an imperfect proxy of the confounder can result in bias in effect size estimates, we carried out simulation experiments under the causal model shown in Fig. 3a. We first generated PGS as a random variable following the standard normal distribution n(0,1), and the rest of the variables were subsequently created as
$$C={\mathrm{rPGS}}+\sqrt{(1-{r}^{2})}{\varepsilon }_{C},$$
$$X={b}_{{CX}}C+\sqrt{(1-{b}_{{CX}}^{2})}{\varepsilon }_{X},$$
and \(Y=X+{b}_{{CY}}C\), where \({\varepsilon }_{C},{\varepsilon }_{X} \sim n(0,1)\).
The variables were simulated as such so that the variance of PGS, C and X were all 1, and the expected effect of X on Y is 1. Under this model, we could change the correlation between C and PGS simply by varying r, and the effect of the confounding factor C on X and Y by varying bCX and bCY. Under each condition, we measured the observed effect of X on Y, conditioned on PGS, which was an imperfect proxy of C, through linear regression \({\rm{lm}}(Y \sim X+{\mathrm{PGS}})\). We denoted estimated bias as the observed regression coefficient −1, which is the expected underlying effect of X on Y.
We also wanted to demonstrate that even under a fixed correlation coefficient between C and PGS, the extent of bias in the observed X on Y effect can still vary due to additional components contributing to only PGS and X, Y, but not C. We further carried out simulations under a different causal model shown in Supplementary Fig. 6a, where PGS and confounder C are correlated due to a common underlying causal factor \({G}^{* }\). Meanwhile, an extra component G′ contributes only to PGS but not to C. Under this model, we first generated the shared causal factor \({G}^{* }\) and the PGS unique causal factor G′ independently following the standard normal distribution n(0,1) and other variables as below:
$${\mathrm{PGS}}={b}_{{G}^{*}\,{\rm{PGS}}}\,{G}^{* }+\sqrt{(1-{b}_{{G}^{*}\,{\rm{PGS}}}^{2})}G^{\prime}$$
\(C={b}_{{G}^{*}C}\,{G}^{* }+\sqrt{(1-{b}_{{G}^{* }C}^{2})}{\varepsilon }_{C}\), where \({b}_{{G}^{* }C}=\frac{r}{{b}_{{G}^{* }\,{\mathrm{PGS}}}}\) and r is the correlation coefficient between C and PGS. We fixed the contribution of \({G}^{* }\) on PGS as \(\frac{{b}_{{G}^{* }{\mathrm{PGS}}}^{2}}{{\mathrm{Var}}({\mathrm{PGS}})}\) = 0.8 and r2 = 0.3 in this experiment.
Subsequently, we simulated X and Y as
$$X={b}_{G^{\prime} X}\,G^{\prime} +{b}_{{CX}}\,C+\sqrt{(1-{b}_{G^{\prime} X}^{2}-{b}_{{CX}}^{2})}{\varepsilon }_{C}\,{\rm{and}}\,{Y}=X+{b}_{G^{\prime} Y}\,G^{\prime} +{b}_{{CY}}\,C$$
The variables were simulated as such so that the variance of G′, G*, PGS, C and X were all 1, and the expected effect of X on Y is 1. In this experiment, for simplicity, we fixed the contribution of C on X and Y so that \(\frac{{\mathrm{Var}}({b}_{{CX}}C)}{{\mathrm{Var}}(X)}=\frac{{\mathrm{Var}}({b}_{{CY}}C)}{{\mathrm{Var}}(Y)}=\,0.3\), and assumed that G′ has no effect on Y. Furthermore, as a proof of concept, we assumed that G′ has a negative effect on X (\({b}_{G{\prime} X}^{2} < 0\)) because in this case, we expect to see an increment in estimate bias when G′ contributes more to the variance of X. We looked at estimate bias from a same linear regression \({\rm{lm}}(Y \sim X+{\mathrm{PGS}})\) in respect of changes in \(\frac{(1-{b}_{{G}^{* }{\mathrm{PGS}}}^{2})}{{\mathrm{Var}}({\mathrm{PGS}})}\) and \(\frac{{b}_{G{\prime} X}^{2}}{{\mathrm{Var}}(X)}\).
Genome-wide association studies
We used REGENIE (v2.2.4)58 to perform a GWAS of empagliflozin initiation in the whole population, including 426,775 samples (cases, 14,996; controls, 411,779), as well as after applying the eligibility criteria of the Empareg emulation, including 11,349 samples (cases, 4,630; controls, 6,719). Details on genotyping and imputation in FinnGen can be found in ref. 21.
MR analysis
By using genetic variants as IVs, we used two-sample MR to investigate the confounding status of numerous variables in a trial emulation setting18. In our MR analysis, we only focused on the Empareg trial emulation. We examined the effect of the 20 traits used in the PGS analysis (sources of external summary statistics can be found in Supplementary Table 13) on the trial outcome, represented by summary statistics for CHD, as well as on receiving the empagliflozin treatment in the whole population and after applying the eligibility criteria, both represented by summary statistics from our GWASs. We performed the MR analysis using the inverse variance-weighted (IVW) method.
To obtain the independent IVs for each trait, we filtered for significant exposure-associated single-nucleotide polymorphisms (SNPs) (P < 5 × 10−8), performed linkage disequilibrium clumping (r2 < 0.001; clumping window = 10,000 kb) and excluded potential outcome-associated SNPs (defined as P < 5 × 10−8 with the outcome).
We identified the following three key steps in using MR to explore confounding: (1) MR of the potential confounder on treatment. Conducting an MR analysis to assess the causal effect of the proposed confounding trait on the treatment variable. If the MR analysis shows a significant association, it suggests the potential confounder is indeed related to the treatment. (2) MR of potential confounder on outcome. Performing a separate MR analysis to evaluate the causal effect of the proposed confounding trait on the trial outcome variable. If the MR analysis demonstrates a significant association, it indicates the potential confounder is also related to the outcome. (3) Interpretation. If both MR analyses (steps 1 and 2) show significant associations, it implies the proposed trait is very likely to be a true confounder that needs to be accounted for and addressed through statistical adjustment in the trial emulation to obtain widely unbiased average treatment effects. Expert knowledge is still required to assess the plausibility of the MR analyses.
Statistical analysis of the outcome PGS within trial emulations
Analogously to the MR analysis, we selected the CHD PGS as outcome PGS for MACE and stroke PGS for the composite endpoint stroke/systemic embolism. We evaluated the effect of the outcome PGS on the primary outcome within each PS-matched cohort using Cox regression and adjusting for the treatment.
$$h({t|T},{\mathrm{PGS}})={h}_{0}(t)\times \exp ({\beta }_{1}\times T+{\beta }_{2}\times {\mathrm{PGS}}),$$
where \(h(t)\) is the hazard at time t, \({h}_{0}(t)\,\) is the baseline hazard at time t, \(T\) is the treatment group, \({\mathrm{PGS}}\) is the outcome PGS, and \({\beta }_{1}\) and \({\beta }_{2}\) are coefficients associated with the treatment group variable \(T\) and \({\mathrm{PGS}}\), respectively.
Additionally, we predicted the outcome of PGS effects on the primary outcome in the full population using Cox regression. Survival times started at birth with follow-up until the occurrence of the primary outcome, death or end of registry information, whichever occurred first.
Furthermore, we determined the event rate of the primary outcome for each trial and investigated the event rates within individuals with the top 25% PGS. Based on that, we calculated the required sample sizes given the new event rates to reach the same statistical power. This was to assess the effect of PGS enrichment on sample sizes in clinical trials.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.