Parent study

This prospective cohort study was based on the Endoscopic Screening for Esophageal Cancer in China (ESECC, NCT01688908) randomized controlled trial. Details of the ESECC trial have been reported previously [9, 14]. Briefly, the study was initiated in 2012 in Hua County, Henan Province, China. After a blocked randomization procedure, 668 villages with 33,847 participants aged 45 to 69 years were randomly selected and equally allocated to the screening arm (n = 17,104) or the control arm (n = 16,743) [9, 14]. The inclusion criteria of the ESECC trial were as follows: (1) Permanent residents aged 45–69 years in the target village; (2) no history of endoscopic examination within 5 years before enrollment; (3) no history of cancer, mental disorder, or infection with hepatitis B virus, hepatitis C virus, or human immunodeficiency virus; and (4) agreement to complete all phases of the examination.

Standard upper gastrointestinal endoscopic examination with LCE in the esophagus was performed by experienced endoscopists between 2012 and 2016 for participants in the screening arm [9, 14]. Biopsies were taken from all clearly visible LULs in the esophagus.

Pathologic diagnosis was made by two experienced pathologists without knowledge of endoscopic results, and discrepancies were adjudicated through consultation with a senior expert [14].

Study design

For this study, we enrolled the participants in the screening arm of the ESECC trial who underwent the baseline LCE and were pathologically diagnosed with no dysplasia (including normal, acanthosis, esophagitis, and basal cell hyperplasia) or had no visible LULs as the screened cohort, which were further divided into the ND-LULs and normal-stained groups.

Participants were assigned to the ND-LULs group if they were biopsied for LULs in the esophagus by endoscopists, and their highest pathologic diagnosis was non-dysplasia at baseline. The normal-stained group was defined as participants without any visible LULs in the esophagus during baseline endoscopic examination.

Two control cohorts were adopted to assess the baseline cancer hazard in these screened subjects (Fig. 1). One cohort comprised individuals in the unscreened control arm of the ESECC trial (the RCT control), and the other was an individually matched control cohort using the New Rural Co-operative Medical Care Scheme (NCMS) in Hua County, a medical insurance program covering 99% of local rural residents [15, 16]. Among approximately 1.18 million individuals registered in the NCMS in Hua County in 2012, we first excluded participants who had undergone endoscopic examination in prior research screening cohorts for EC (n = 29,486) [14, 17]. For each individual in the screened cohort, candidate controls were first identified from the NCMS roster and matched by birth year, sex, and village. An identical enrollment date to that of the screened individual was assigned to these controls. After excluding those diagnosed with any cancers or who died before enrollment, we randomly sampled five controls without replacement to form the final population control group.

Fig. 1figure 1

Flowchart of participant selection in this study

Covariates assessment

The potential risk factors for ESCC [6, 18,19,20], including age, sex, body mass index (BMI), family history of ESCC, smoking, eating speed, and ingestion of leftover food, were collected for all participants in the ND-LULs, normal-stained, and RCT control groups by an interviewer-administered electronic questionnaire at baseline [20].

The sociodemographic information of the participants in population control groups was extracted from the NCMS roster, including birth year, age, sex, and village.

Follow-up and outcome ascertainment

The primary outcome was incident SDA cases in the esophagus during follow-up, including severe dysplasia, cancer in situ (CIS), and ESCC.

For groups derived from the ESECC trial (the ND-LULs, normal-stained, and RCT control groups), incident SDAs were identified through endoscopic surveillance or routine follow-up via annual door-to-door active interviews and passive linkage with the NCMS and the Death Surveillance System in Hua County [6, 14]. This follow-up strategy has shown over 95% sensitivity in identifying incident cancer cases [16, 21, 22]. Follow-up ended on May 31, 2023. The most cases detected at re-examination would likely have progressed to clinical-stage ESCC, as our long follow-up period would generally cover their natural course of progression [23].

For the general population controls, incident SDA events were identified using International Classification of Diseases 10th revision codes (ICD-10 code: C15, esophageal cancer) and text-based diagnoses, comprising severe dysplasia, CIS, and ESCC, in the NCMS reimbursement records. The claims-based diagnoses from NCMS in Hua County have been validated previously as highly consistent with hospital records [22]. The follow-up for population control groups ended on December 11, 2021.

The secondary outcome was defined as incident cases of non-upper gastrointestinal cancer (non-UGIC), including all cancer sites except for EC, cardia cancer, gastric cancer, and duodenal cancer.

Statistical analysis

Incidence rates (per 100,000 person-years) were calculated as the number of outcome events divided by corresponding person-years at risk. Follow-up was censored at the date of the outcome event, death, or the end of follow-up, whichever occurred first. Due to the variations in follow-up duration between screened groups and unscreened control groups, the maximum follow-up time in this study was set to 10 years.

Poisson regression was employed to estimate the incidence rate ratios (IRRs) for SDAs among the ND-LULs and normal-stained groups versus controls. When compared with the population control, IRRs were adjusted for potential heterogeneity in baseline cancer risk between groups. The incidence of non-UGIC, which is not affected by endoscopic screening, was used as a proxy for baseline cancer risk. When using the RCT control group as the reference, adjustments were made for ESCC risk factors, including age, sex, BMI, family history of ESCC, smoking, eating speed, and ingestion of leftover food. The missing questionnaire data were imputed using a random forest algorithm.

The cumulative incidence function (CIF) and the Fine-Gray test were employed to evaluate the cumulative incidence of SDAs in the esophagus among the ND-LULs group, the normal-stained group, and the control groups, treating death as a competing risk.

Subgroup analyses were conducted by age and sex. Further stratified analyses were performed within the ND-LULs group across sociodemographic characteristics, including sex, age, BMI, family history of ESCC, and smoking. Two additional sensitivity analyses were performed. First, we conducted a multivariate competing risk model adjusted for baseline ESCC risk factors within the ND-LULs, normal-stained, and RCT control groups. Second, we depicted the trends of annual cumulative IRR for SDAs over 10 years among ND-LULs and normal-stained groups, compared with the two control groups using Poisson regression.

All tests were two-sided, with a significance level of 0.05. All statistical analyses were performed using R version 4.4.1 (R Foundation for Statistical Computing, Vienna, Austria).