Introduction
Inflammatory bowel disease (IBD) is a chronic inflammatory condition primarily affecting the gastrointestinal tract. The most commonly affected areas are the ileum, rectum, and colon. Clinically, IBD is marked by symptoms such as abdominal pain and diarrhea, with hematochezia occurring in severe cases. IBD is classified into two major subtypes: Crohn’s disease (CD) and ulcerative colitis (UC). UC is characterized by continuous inflammation of the mucosal and submucosal layers that is confined to the colon. The inflammation typically begins in the rectum and gradually extends proximally to involve the entire colon. CD, in contrast, can influence the entire gastrointestinal tract in a discontinuous pattern and involves all layers of the intestinal wall. The most commonly affected sites are the terminal ileum, colon, and perianal region. Currently, IBD pathogenesis is widely believed to result from a multifactorial interplay involving genetic susceptibility, environmental exposures, gut microbiota dysbiosis, and immune dysregulation.1–3 Cholangiocarcinoma (CCA) refers to a group of highly heterogeneous cancers originating from the biliary tract. Based on anatomical location, CCA is divided into intrahepatic (iCCA), perihilar (pCCA), and distal (dCCA) subtypes.4,5 Notably, the early stages of CCA are typically asymptomatic, and about 70% of cases are diagnosed at an advanced stage, leading to a five-year survival rate below 20%.6–8
The incidence of CCA among individuals with IBD ranges from 0.5% to 1.0%,9 with a nearly fourfold higher risk in comparison to the general population.10,11 Moreover, individuals with IBD tend to develop intrahepatic rather than extrahepatic CCA.12 Recent epidemiological research reveals a strong relationship between IBD and hepatopancreatobiliary carcinoma.13,14 A 2022 Nordic multicenter cohort study of 141,960 IBD patients provided compelling evidence of this relationship. The study reported that IBD patients with concurrent primary sclerosing cholangitis (PSC) had a 140-fold greater risk of CCA. Even those without PSC exhibited a 2.5-fold increased risk compared to the general population.15 These findings underscore the clinical relevance of elucidating the independent mechanisms linking IBD and CCA. Current evidence suggests that a sequential “inflammation-dysplasia-carcinoma” process may serve as a shared pathological basis of the two diseases. However, the specific molecular mechanisms are still unclear.
Importantly, the incidence of IBD has been rising steadily in China in recent years.16 Moreover, individuals with IBD exhibit an elevated risk of death in comparison to the general population, with a standardized mortality ratio of 2.0 reported for CD.17 CCA presents distinct epidemiological features in the Asia-Pacific region. In certain regions of countries such as China and South Korea, the incidence of CCA exceeds 6 per 100,000 population.18 Although international guidelines recommend annual hepatobiliary cancer surveillance in individuals with PSC-IBD,5 robust evidence-based strategies for monitoring individuals with non-PSC IBD are scarce. The association between IBD and CCA is rooted in the classic sequence of “inflammation-dysplasia-carcinoma”, which is driven by chronic inflammation.19 In this process, dysregulation of the immune microenvironment plays a central role. Persistent inflammation and repair in IBD lead to an immune imbalance whose features show remarkable similarity to the immunosuppressive state in the CCA tumor microenvironment. This includes the enrichment of M2 macrophages and T cell exhaustion.20–23 These shared immune infiltration patterns strongly suggest that abnormal immune regulation is a critical pathway connecting the two diseases. Elucidating this shared molecular and immunological basis will provide a key theoretical foundation for developing early screening strategies for CCA in IBD patients. Given this context, the present research integrated multi-omics analysis methods to comprehensively examine the shared molecular features of IBD and CCA. By incorporating weighted gene co-expression network analysis (WGCNA), differential gene expression analysis, and immune microenvironment profiling, this study aims to offer molecular insights into early detection of CCA in patients with IBD.
Materials and MethodsPreprocessing of Bulk Transcriptomic Data
For the IBD group, the training dataset included 455 normal control samples and 1151 IBD samples from GSE193677. Validation datasets included 18 IBD and 6 healthy samples from GSE16879, 140 IBD and 26 healthy samples from GSE112366, and 67 IBD and 11 healthy samples from GSE75215. For the CCA group, the training dataset comprised 30 CCA and 27 normal samples from GSE107943, as well as 35 CCA and 9 normal samples from the TCGA-CHOL database. The validation dataset included 16 CCA and 7 normal samples from GSE32879 (Table 1).
Table 1 Dataset Information
All transcriptomic data were normalized using the log2(X+1) transformation. Batch effects were removed through the Surrogate Variable Analysis (SVA) algorithm, and datasets were subsequently integrated to construct the following combined datasets: (i) IBD_merge_data, comprising samples from GSE16879, GSE112366, and GSE75215; (ii) CCA_merge_data, comprising samples from GSE107943 and TCGA-CHOL.
Identification of Differentially Expressed Genes (DEGs)
The Limma package was utilized to carry out differential expression analysis. The threshold criteria were set at adjusted p-value < 0.05 and |log fold change | > 0.585. P-values were adjusted for multiple testing using the Benjamini–Hochberg procedure.
Weighted Gene Co-Expression Network Analysis (WGCNA)
The WGCNA package in R was utilized to establish a weighted gene co-expression network based on the merged dataset.24 The “goodSampleGenes” function was employed to estimate data integrity. The optimal soft-thresholding power was confirmed using the “PickSoftThreshold” function. The relationship between gene modules and clinical traits was evaluated using phenotypic data.
Machine Learning (ML) Algorithms for Identifying Potential Diagnostic Biomarkers
Four ML algorithms were applied to discover candidate diagnostic biomarkers in IBD and CCA: random forest (RF),25 logistic regression (LR),26 least absolute shrinkage and selection operator (LASSO) regression,27 and support vector machine-recursive feature elimination (SVM-RFE).28 RF was implemented via the “randomForest” package, LR and LASSO via the “glmnet” package, and SVM-RFE through the “msvmRFE” package. Genes were ranked based on the decrease in Gini index using the RF algorithm, and the top 10 genes with a significance score greater than 3 were selected for downstream analysis. In the LR model, gene expression was analyzed as a continuous variable, while sample type served as a binary response variable. Five-fold cross-validation was employed to estimate the misclassification error of candidate models and to confirm the optimal lambda (λ). The λ parameter in the LASSO model was used to select the minimal number of predictive variables. The SVM-RFE algorithm was employed to generate a gene ranking list, and the best-performing subset was selected using a linear kernel. The intersecting results from RF, LR, SVM-RFE, and LASSO were defined as the hub genes for IBD and CCA, respectively. The overlapping hub genes between the two diseases were considered shared diagnostic biomarkers for IBD and CCA patients. The diagnostic performance of these genes for IBD and CCA was appraised through receiver operating characteristic (ROC) curve analysis. Their diagnostic accuracy was confirmed by external validation datasets.
Single-Gene Gene Set Enrichment Analysis (GSEA)
The functional relevance of key genes was examined by single-gene GSEA. By identifying significantly enriched pathways in the datasets, this analysis elucidated the biological processes potentially regulated by the genes involved. Enrichments with p-values below 0.05 were deemed statistically significant, indicating gene involvement in critical pathophysiological mechanisms underlying IBD and CCA.
Prognostic Evaluation of Hub Genes in CCA Patients
Individuals with CCA were stratified into high- and low-expression groups according to the optimal cut-off values of hub gene expression. The prognostic significance of the identified genes was appraised through Kaplan–Meier (K-M) survival analysis.
Immune Infiltration and Correlation Analysis
CIBERSORT, a tool for estimating the relative proportions of immune cell subsets based on bulk RNA transcriptomic data, and the Immuno-Oncology Biological Research tool were employed to investigate immune cell infiltration in IBD and CCA. Subsequently, Spearman correlation analysis was carried out to examine the relationships between immune cells and the expression levels of hub genes.
Drug Sensitivity Analysis
Data on anticancer drug responses and genomic sensitivity markers were obtained from the Genomics of Drug Sensitivity in Cancer database,29 one of the largest publicly available resources for information on drug sensitivity and molecular biomarkers of drug response in cancer cells. The pRRophetic algorithm was employed to calculate the half-maximal inhibitory concentration (IC50) values based on CCA data to predict patients’ sensitivity to common anticancer drugs and small molecule compounds. Pearson correlation coefficients were calculated to examine the relationships between drug sensitivity and hub gene expression. The results were visualized via grouped comparison plots and dot-line graphs.
Validation in Clinical Samples
Ten patients were recruited from the Department of Gastroenterology at Wuxi Second People’s Hospital, including five patients with colonic polyps and five with IBD. Inclusion criteria included: (i) participants aged ≥18 years who offered written informed consent; (ii) diagnosis confirmed based on clinical symptoms, laboratory tests, endoscopy, imaging examinations, and histopathological findings in accordance with the World Health Organization (WHO) diagnostic criteria for IBD. Exclusion criteria encompassed: (i) severe complications, including toxic megacolon or intestinal perforation; (ii) suspected malignancy; (iii) presence of autoimmune diseases, including systemic lupus erythematosus, psoriasis, rheumatoid arthritis, or Graves’ disease. The normal control group included individuals aged ≥18 years who had provided written informed consent and underwent colonoscopy for polyp screening. Normal colonic tissue samples (2–3 pieces per individual) were collected. These individuals had no familial relationship with the case group and exhibited no abnormalities in complete blood count or biochemical tests.
In addition, three CCA tumor tissues and adjacent non-tumor tissues were extracted from patients at the Department of Hepatobiliary Surgery, Wuxi Second People’s Hospital. The inclusion criteria were individuals with a confirmed diagnosis of CCA based on imaging and pathological findings. Individuals with chronic hepatobiliary diseases or malignancies other than CCA were excluded. This study was carried out following the principles of the Declaration of Helsinki and obtained approval from the Medical Ethics Committee of Wuxi Second People’s Hospital. All recruited individuals were informed of the research and provided written informed consent.
Total RNA was extracted from CD, colonic polyp, CCA tissues, as well as adjacent non-tumor tissues using TRIzol reagent (Invitrogen, Carlsbad, CA, United States). The HiScript II RT SuperMix (Nanjing Jiancheng, Nanjing, China) was used for reverse transcription following the instructions provided by the manufacturer. Quantitative real-time PCR (qPCR) was carried out via ChamQ Universal SYBR qPCR Master Mix (Vazyme, Nanjing, China). Each experiment was conducted in triplicate. The 2^−ΔΔCt method was utilized to calculate relative gene expression, with GAPDH as the internal reference. Table 2 lists the primer sequences.
Table 2 Gene Primer Sequences
ResultsIdentification of DEGs in IBD and CCA
Batch effects in the combined datasets, namely CCA_merge_data (integrating GSE107943 and TCGA) and IBD_merge_data (integrating GSE16879, GSE112366, and GSE75215), were removed using the “sva” package. Normalization was conducted via the “limma” package (Figures 1A–H).
Figure 1 Data preprocessing for CCA and IBD. (A–D) PCA plots displaying expression patterns in the CCA datasets GSE107943 and TCGA before (A–C) and after (B–D) batch effect correction. (E–H) PCA plots illustrating expression patterns in the IBD datasets GSE16879, GSE112366, and GSE75215 before (E–G) and after (F–H) batch effect correction.
A total of 209 DEGs were obtained from the IBD dataset, encompassing 162 upregulated and 47 downregulated genes (Figure 2A). Heatmaps of the top 20 significantly upregulated and downregulated genes in this dataset were also generated (Figure 2B). Meanwhile, 10,968 DEGs were collected from the CCA dataset, including 6,077 upregulated and 4,891 downregulated genes (Figure 2C). Heatmaps of the top 20 significantly upregulated and downregulated genes in this dataset were also generated (Figure 2D). Additionally, 50 overlapping DEGs were identified between the IBD and CCA datasets, encompassing 34 upregulated and 16 downregulated genes (Figures 2E and F).
Figure 2 Volcano plots, heatmaps, and co-expressed DEGs. (A) Volcano plot of DEGs in the IBD dataset. (B) Heatmap of DEGs in the IBD dataset. (C) Volcano plot of DEGs in the CCA dataset. (D) Heatmap of DEGs in the CCA dataset. (E) Venn diagram of co-upregulated DEGs shared between the IBD and CCA datasets. (F) Venn diagram of co-downregulated DEGs shared between the IBD and CCA datasets. Red indicates upregulated genes, blue suggests downregulated genes, and gray indicates non-DEGs.
WGCNA in IBD and CCA
WGCNA was carried out on both the IBD and CCA datasets to examine the relationships between clinical traits and gene expression. No significant outlier samples were identified in either dataset. According to the WGCNA algorithm, the optimal soft-thresholding powers were 21 for the IBD dataset and 7 for the CCA dataset (Figure 3A and B). Based on module similarity, four modules were identified in the IBD dataset and five in the CCA dataset (Figure 3C and D). Module-trait correlation analysis was subsequently conducted. In the IBD dataset, the grey module demonstrated the most significant positive relationship with IBD (r = 0.11, p < 0.05), while in the CCA dataset, the blue module demonstrated the most significant positive relationship with CCA (r = 0.89, p < 0.05) (Figure 3E and F). Ultimately, 13 overlapping genes were obtained by intersecting DEGs with key WGCNA modules. These genes may contribute to the pathogenesis of IBD and CCA (Figure 3G).
Figure 3 WGCNA identifying key gene modules in IBD and CCA, and the intersection between DEGs and these key modules in both diseases. (A–B) Selection of the optimal soft-thresholding power in the IBD (A) and CCA (B) datasets. (C–D) Gene cluster trees and correlation analyses of module eigengenes in the IBD (C) and CCA (D) datasets. (E–F) Heatmaps illustrating correlations between module eigengenes and clinical traits in IBD and CCA. Red indicates positive correlations, and blue suggests negative correlations. (G) Venn diagram of overlapping genes between DEGs and the blue module from the IBD dataset, and between DEGs and the grey module from the CCA dataset.
Identification and Validation of Shared Hub Genes via ML
Feature selection was performed based on ML algorithms to further discover the most diagnostically relevant genes. Among the 13 candidate genes, LASSO regression selected 8 genes (Figure 4A and B), SVM-RFE identified 10 genes (Figure 4C and D), and LR selected genes with p < 0.05 (Figure 4E). RF ranked the top 30 genes by importance score (Figure 4F and G). Ultimately, eight common diagnostic biomarkers were identified: CCL11, CCL20, DUOX2, DUOXA2, LCN2, NOS2, PDZK1IP1, and TRIM40 (Figure 4H).
Figure 4 Identification of common hub genes in IBD and CCA. (A–B) Results of LASSO regression analysis. (C–D) Results of SVM-RFE. (E) Results of LR analysis. (F–G) Results of RF analysis. (H) Venn diagram illustrating overlapping genes identified by the four ML algorithms.
Furthermore, ROC curves (Figure 5) were utilized to appraise the diagnostic value of the hub genes across different datasets (Figure 5A and B). In the IBD dataset, the areas under the curve (AUC) were as follows: CCL11 (0.68), CCL20 (0.61), DUOX2 (0.74), DUOXA2 (0.75), LCN2 (0.71), NOS2 (0.72), PDZK1IP1 (0.64), and TRIM40 (0.66) (Figure 5A). In the CCA dataset, the corresponding AUCs were: CCL11 (0.70), CCL20 (0.85), DUOX2 (0.78), DUOXA2 (0.74), LCN2 (0.79), NOS2 (0.64), PDZK1IP1 (0.93), and TRIM40 (0.74) (Figure 5B). Among these, LCN2, DUOX2, and DUOXA2 showed AUCs greater than 0.7. Boxplots demonstrated that these three diagnostic biomarkers were notably upregulated in the disease groups within the IBD and CCA training cohorts (Figure 5C–D), indicating their strong diagnostic potential as biomarkers for IBD and CCA.
Figure 5 ROC curves and expression of eight common diagnostic biomarkers in the training datasets. (A) ROC curves of the eight common diagnostic biomarkers in the IBD-GSE193677 cohort. (B) ROC curves of the eight common diagnostic biomarkers in the CCA_merge_data cohort. (C) Expression of the eight common diagnostic biomarkers in the IBD-GSE193677 cohort. (D) Expression of the eight common diagnostic biomarkers in the CCA_merge_data cohort.
Note: *p < 0.05; **p < 0.01; ****p < 0.0001.
In the IBD validation datasets, DUOX2 (AUC = 0.88), DUOXA2 (AUC = 0.87), and LCN2 (AUC = 0.90) were identified (Figure 6A). In the CCA validation cohort (GSE32879), the corresponding AUCs were 0.88 for DUOX2, 0.63 for DUOXA2, and 0.88 for LCN2 (Figure 6B). Among these, DUOX2 and LCN2 demonstrated AUCs greater than 0.8, whereas DUOXA2 showed an AUC below 0.7 in the CCA validation cohort (Figure 6B). Boxplots demonstrated that DUOXA2 and LCN2 exhibited expression patterns in the disease groups consistent with those observed in the training sets. However, DUOXA2 did not show a significant difference in the CCA validation cohort (Figures 6C and D).
Figure 6 ROC curves and expression of eight common diagnostic biomarkers in the validation datasets. (A) ROC curves of the eight common diagnostic biomarkers in the IBD_merge_data cohort. (B) ROC curves of the eight common diagnostic biomarkers in the CCA-GSE32879 cohort. (C) Expression of the eight common diagnostic biomarkers in the IBD_merge_data cohort. (D) Expression of the eight common diagnostic biomarkers in the CCA-GSE32879 cohort.
Note: **p < 0.01; ****p < 0.0001.
Single-Gene GSEA
After identifying DUOX2 and LCN2 as potential diagnostic biomarkers, we performed single-gene Gene Set Enrichment Analysis (GSEA) to explore their potential biological functions. As shown in Figure 7, the enriched pathways were ranked in descending order based on their correlation with each target gene. Gene Ontology (GO) analysis revealed that DUOX2 was significantly enriched in terms associated with immune effector processes, the adaptive immune response, and B cell/lymphocyte-mediated immunity. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis demonstrated that DUOX2 was primarily enriched in the TNF signaling pathway, IL-17 signaling pathway, and cytokine–cytokine receptor interaction pathway, among others (Figures 7A–D).
Figure 7 Functional and pathway enrichment analyses of DUOX2, a shared driver gene in IBD and CCA. (A) GO analysis of DUOX2. (C) KEGG analysis of DUOX2. (B–D) GSEA of DUOX2.
GO analysis revealed that LCN2 was significantly enriched in adaptive immune responses, particularly those associated with B cells and immunoglobulins. These include immune effector processes, adaptive immunity, and B-cell/lymphocyte-mediated immunity. KEGG pathway analysis identified LCN2 enrichment in the TNF and IL-17 signaling pathways, cytokine–cytokine receptor interaction, autoimmune and metabolic diseases, and infection and host defense pathways (Figures 8A–D).
Figure 8 Functional and pathway enrichment analyses of LCN2, a shared driver gene in IBD and CCA. (A) GO analysis of LCN2. (C) KEGG analysis of LCN2. (B–D) GSEA of LCN2.
Prognostic Value of Hub Genes in CCA Patients
The relationship between the DUOX2 and LCN2 expression and overall survival (OS) in CCA patients was assessed through K-M survival analysis. Based on the optimal cutoff values of the hub genes, individuals in the CCA cohort were stratified into high- and low-expression groups. The results demonstrated that individuals with low expression levels of DUOX2 and LCN2 exhibited significantly better OS in comparison to those with high expression levels (p < 0.05, Figure 9A and B).
Figure 9 Survival analysis of CCA patients from the CCA_merge_data cohort based on high and low expression levels of DUOX2 and LCN2 mRNA. (A) K-M curve of OS for 74 CCA patients stratified by the optimal cutoff risk score of DUOX2 expression. (B) K-M curve of OS for 74 CCA patients stratified by the optimal cutoff risk score of LCN2 expression.
Identification of Candidate Drugs Based on Hub Genes
Candidate drugs were screened using the Drug Signature Database29 and the “pRRophetic” R package. Drugs with an absolute correlation coefficient greater than 0.4 and a p-value less than 0.05 were selected as potential therapies for combined treatment of IBD and CCA (Table 3). LCN2 may participate in resistance mechanisms to multiple chemotherapeutic agents, and its inhibition could enhance drug sensitivity. DUOX2 was potentially related to oxidative stress pathways, with high-expression patients possibly exhibiting increased sensitivity to drugs such as PF-4708671 (Figure 10A–J).
Table 3 Drug Sensitivity Analysis
Immune Cell Infiltration and Its Relationship with Shared Hub Genes
Immune cell abundances in each sample were systematically analyzed using CIBERSORT to examine the distribution characteristics and potential roles of immune cells in the complex pathogenesis of IBD and CCA. Figures 11A and B clearly display the relative abundance of 22 immune cell types in each sample from IBD and CCA patients. Detailed comparative analysis demonstrated that M2 macrophages and resting memory CD4⁺ T cells accounted for a substantial proportion of immune cells in both IBD and CCA patients. These findings suggested a potential shared mechanism of immune cell infiltration in both diseases.
Figure 11 Immune infiltration analysis in IBD and CCA. (A) Relative abundance of 22 immune cell types in each IBD sample. (B) Relative abundance of 22 immune cell types in each CCA sample. (C) Heatmap displaying correlations between hub genes and immune cells in IBD samples. Red indicates positive correlation and green suggests negative correlation. (D) Heatmap illustrating correlations between hub genes and immune cells in CCA samples. Red suggests positive correlation, and green suggests negative correlation. *p < 0.05; **p < 0.01; ***p < 0.001.
Based on this, further correlation analysis was carried out. The results demonstrated that in both disease datasets, LCN2 was positively linked to regulatory T cells, M0 macrophages, and follicular helper T cells. Similarly, DUOX2 exhibited a positive relationship with M0 macrophages in both datasets (Figures 11C and D).
Validation of Hub Genes in Clinical Samples
Intestinal mucosal tissues from IBD patients and normal controls, as well as CCA tissues and adjacent non-tumor tissues, were extracted to validate the expression of hub genes in clinical samples. qRT-PCR was performed to examine the expression of DUOX2 and LCN2 in these samples. The results aligned with the previous data analysis. In comparison to the control group, DUOX2 expression was notably upregulated. LCN2 expression was substantially increased in IBD patients and exhibited a noticeable increasing trend in CCA patients (Figure 12A–D).
Figure 12 qRT-PCR validation of hub genes. (A) Validation of DUOX2 expression in the IBD versus the control group. (B) Validation of DUOX2 expression in the CCA versus the control group. (C) Validation of LCN2 expression in the IBD versus the control group. (D) Validation of LCN2 expression in the CCA versus the control group. ns, not significant (P ≥ 0.05); *P < 0.05; **P < 0.01.
Discussion
IBD affects millions of individuals worldwide, imposing a substantial disease burden and long-term health risks.30 Its complications, including extraintestinal manifestations and malignancies, can severely compromise patients’ quality of life and long-term survival. Notably, accumulating clinical evidence indicates that individuals with IBD have a markedly greater risk of CCA in comparison to the general population.15 CCA refers to a highly aggressive malignancy of the biliary tract, often diagnosed at advanced stages and prone to metastasis. Although some patients may benefit from surgical or multimodal treatment, the overall prognosis remains dismal, posing a serious threat to survival.6 Although epidemiological evidence has established a relationship between IBD and CCA, specific markers for early screening remain lacking. Given the insidious onset of CCA, many individuals are diagnosed at an advanced stage, often requiring liver transplantation. However, the prognosis remains extremely poor, with notably decreased three-month survival rates. Therefore, identifying early biomarkers is urgently needed. Such markers could preliminarily indicate the risk of CCA in IBD patients, thereby guiding clinicians to perform regular biliary monitoring of high-risk individuals through imaging techniques, including endoscopic ultrasound and magnetic resonance imaging. Early detection of lesions through this approach could substantially lower the risk of disease progression to advanced stages and improve patient survival.31
In this context, the present research aimed to examine the molecular mechanisms underlying the comorbidity of IBD and CCA, with a focus on identifying early diagnostic biomarkers for IBD patients at risk of developing CCA. By systematically analyzing gene expression profiles and molecular interaction networks through bioinformatics, a cross-disease gene regulatory model was established for the first time. DEG analysis was combined with WGCNA to identify 13 core regulatory genes. Subsequently, ML was applied to integrate multi-omics data, and qRT-PCR experiments confirmed DUOX2 and LCN2 as key hub genes shared by both diseases.
DUOX2 belongs to the nicotinamide adenine dinucleotide phosphate oxidase family. DUOX2-encoded enzyme exerts an essential role in the iodination process of thyroid hormone synthesis by generating hydrogen peroxide (H2O2). DUOX2 also contributes to host defense in the respiratory epithelium and gastrointestinal tract by mediating the generation of reactive oxygen species (ROS) and participating in various physiological functions.32 DUOX2 mRNA is predominantly expressed in the colon and is also present at lower levels in the testes, liver, kidney, prostate, pancreas, and lung.33 The regulatory role of DUOX2 in disease has been extensively reported. It was initially studied for its involvement in maintaining mucosal barrier integrity and regulating innate immune responses in the gut. Subsequent research reveals that DUOX2 is markedly upregulated in intestinal tissues of individuals with IBD,34 and that genetic alterations in DUOX2 are linked to elevated risk of Crohn’s disease and very early-onset IBD.35,36 Consequently, DUOX2 has become a focus of research into the molecular mechanisms underlying IBD pathogenesis. More recently, the involvement of DUOX2 in cancer progression has been explored. Current evidence indicates that DUOX2 is frequently overexpressed in gastrointestinal malignancies, including colorectal cancer, where it may promote tumor development through ROS-mediated signaling pathways.37,38 These findings highlight DUOX2 as a promising prognostic indicator and potential therapeutic target (such as selective DUOX2 inhibitors).39 Nonetheless, the exact molecular mechanisms by which DUOX2 contributes to specific diseases, especially CCA, remain unclear and warrant further functional investigation.
LCN2, a member of the lipocalin protein superfamily, regulates iron metabolism, inflammation, and immune homeostasis by binding siderophores.40 LCN2 expression is inflammation-dependent: under physiological conditions, its expression remains low, but it is markedly upregulated in response to inflammatory stimuli, such as in IBD. This dynamic regulation, along with its biochemical stability, suggests that LCN2 might act as a reliable biomarker for inflammatory disorders.41–43 Elevated expression of LCN2 has also been reported in various cancers,44 with distinct tissue-specific functions. In pancreatic cancer, LCN2 promotes tumor invasion and angiogenesis by stabilizing matrix metalloproteinase-9 (MMP-9) and enhancing extracellular matrix degradation. In endometrial cancer, its overexpression may induce epithelial-mesenchymal transition, thereby accelerating metastasis. In colorectal cancer, although LCN2 can inhibit MMP-9 activity and decrease the risk of liver metastasis, its role in iron-mediated tumorigenesis remains controversial.45–48 Notably, the expression pattern, mechanistic function, and clinical relevance of LCN2 in CCA have not been well defined, highlighting the need for further research to investigate its utility as a diagnostic marker and a therapeutic target.
We assessed the functional relevance of the hub genes in IBD and CCA using single-gene GSEA. GO enrichment analysis revealed that DUOX2 was involved in immune effector processes, adaptive immunity, and B-cell/lymphocyte-mediated immune responses. KEGG analysis implicated dysregulation of the IL-17 inflammatory cascade, the TNF signaling axis, and cytokine-cytokine receptor interaction. A highly similar enrichment pattern was found for LCN2. GO analysis highlighted its association with adaptive and humoral immunity, particularly the B-cell immunoglobulin axis. KEGG analysis reinforced its involvement in the TNF/IL-17-driven inflammatory cascade, cytokine-cytokine receptor interaction, and infectious disease-related pathways. Notably, the enrichment profiles of both genes converged on key immune processes, including inflammatory mediator regulation, pathogen recognition and defense, and autoimmune responses. Together, these findings suggest that LCN2 and DUOX2 may modulate a shared “immune-inflammation axis”, which drives the comorbidity between IBD and CCA by dysregulating the immune microenvironment, triggering excessive proinflammatory signaling, and facilitating the inflammation-to-cancer transition.
Survival analysis incorporated DUOX2 and LCN2 as prognostic gene markers. Based on the optimal cutoff values for each CCA patient, individuals were stratified into low- and high-expression groups. K-M curves suggested that individuals in the high-expression group had notably worse OS, with a higher death rate compared to the low-expression group. Subsequent drug sensitivity analysis of the hub genes DUOX2 and LCN2 suggested that LCN2 may participate in multiple chemotherapy resistance mechanisms, and that its inhibition could enhance drug sensitivity. DUOX2 was potentially associated with oxidative stress pathways, and patients with high DUOX2 expression appeared more responsive to targeted agents such as PF-4708671. Immune infiltration analysis demonstrated significant enrichment of multiple immune cells in both IBD and CCA samples. Macrophages M2 and resting memory CD4⁺ T cells accounted for a substantial proportion, suggesting a microenvironment marked by immune suppression and tissue remodeling. This finding aligns well with established disease mechanisms. M2 macrophages can promote pathological fibrosis in IBD and drive immune escape and tumor progression in CCA. Meanwhile, the accumulation of resting memory CD4⁺ T cells reflects a T cell pre-exhaustion state induced by prolonged antigen exposure.20,21,23,49 M2 macrophages can suppress inflammation by generating anti-inflammatory cytokines such as TGF-β and IL-1022. However, their excessive activation may contribute to fibrosis in chronic inflammation50,51 or promote angiogenesis and immune evasion in the tumor microenvironment (such as the malignant progression of CCA).20 Additionally, the increased proportion of resting memory CD4⁺ T cells may reflect a pre-exhaustion state induced by prolonged antigen exposure. These cells, functionally impaired, are less effective in initiating anti-inflammatory or antitumor responses. They may act in concert with M2 macrophages to form an immunosuppressive network, thereby impairing the host’s ability to eliminate pathogens or tumor cells and contributing to disease chronicity or malignant progression. Both LCN2 and DUOX2 exhibited positive correlations with M0 macrophages. Prior studies have reported that LCN2 promotes inflammatory regulation by stimulating the NF-κB pathway during macrophage activation and facilitates tumor progression via macrophage polarization-mediated iron delivery.52 Furthermore, LCN2 can modulate iron metabolism and amplify proinflammatory signaling by binding siderophores, while DUOX2 activates inflammation-related pathways through ROS production.53 Together, they may drive the polarization of M0 macrophages toward proinflammatory phenotypes (such as M1) or maintain their latent inflammatory potential in an unactivated state. Their interplay is particularly prominent in IBD and the tumor microenvironment, supporting their potential roles as early inflammatory markers and therapeutic targets. Further studies are required to elucidate their molecular interactions and regulatory networks in disease progression.
Several limitations in the current research warrant attention. First, although our findings were validated by qPCR, the validation was performed on a limited number of patient samples. Further confirmation in larger, independent cohorts is therefore necessary. Future studies should involve multicenter, prospective cohorts with well-characterized clinical and immunological parameters to more robustly validate the identified biomarkers DUOX2 and LCN2 in IBD and CCA. Second, this study relied primarily on publicly available transcriptomic datasets, including GEO and TCGA. While these repositories provide valuable large-scale data, they may not fully capture the heterogeneity of IBD and CCA. These datasets often lack detailed clinical metadata, such as disease severity, treatment history, and comorbidities. This information is essential for a comprehensive understanding of the relationship between IBD and CCA. Additionally, variability in sample collection, processing, and sequencing platforms across datasets may introduce batch effects and potential biases in gene expression analysis. Furthermore, due to the limited CCA sample size in public databases such as TCGA, the survival analyses in this study may be underpowered. These results should be interpreted as exploratory, providing preliminary insights into the prognostic potential of the hub genes. We recommend interpreting these findings with caution, and their definitive clinical relevance should be verified in future studies with larger sample sizes. Longitudinal follow-up of patients may help establish causal relationships between chronic inflammation in IBD and tumor progression in CCA, thereby strengthening the biological relevance of our results. Finally, this study identified associations between DUOX2 and LCN2 and immune regulation in both IBD and CCA; however, the underlying mechanisms remain incompletely elucidated. Future investigations should include in vivo models and functional assays to confirm causality and identify potential therapeutic targets.
Conclusion
This study is the first to reveal gene-gene interactions and potential molecular mechanisms shared by IBD and CCA through transcriptomic analysis. LCN2 and DUOX2 were identified as shared signature genes in IBD and CCA through ML-based analysis, highlighting their potential as therapeutic targets for both diseases. The findings suggest that immune dysregulation, inflammatory responses, and infection-related pathways may represent common pathological mechanisms underlying the two diseases.
Data Statement
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.
Ethics Approval and Consent to Participate
All subjects participating in the study agreed and signed an informed consent form. The review board of the Second People’s Hospital of Wuxi approved the use of intestinal biopsy samples and tumor tissues and adjacent non-tumor tissues. Our protocol was approved by the Institutional Review Board (Ethical Review 2024-Y-23 and WXEY-2025-116).
Funding
This work was supported by the Wuxi Double Hundred Young and Middle-aged Health Care Talents (BJ2020024); the Wuxi Medical Innovation Team Gastroenterology (CXTD2021020); and the Youth Project of Wuxi Municipal Health Commission (Q202431).
Disclosure
Shiqing Yuan and Zhen Hu are co-first authors for this study. The authors declare that they have no competing interests for this work.
References
1. Aniwan S, Park SH, Loftus EV. Epidemiology, natural history, and risk stratification of crohn’s disease. Gastroenterol Clin North Am. 2017;46(3):463–480. doi:10.1016/j.gtc.2017.05.003
2. Ramos GP, Papadakis KA. Mechanisms of disease: inflammatory bowel diseases. Mayo Clin Proc. 2019;94(1):155–165. doi:10.1016/j.mayocp.2018.09.013
3. Baumgart DC. The diagnosis and treatment of Crohn’s disease and ulcerative colitis. Dtsch Arztebl Int. 2009;106(8):123–133. doi:10.3238/arztebl.2009.0123
4. Banales JM, Marin JJG, Lamarca A, et al. Cholangiocarcinoma 2020: the next horizon in mechanisms and management. Nat Rev Gastroenterol Hepatol. 2020;17(9):557–588. doi:10.1038/s41575-020-0310-z
5. Khan SA, Rushbrook SM, Kendall TJ, et al. Guidelines development group for the british society of gastroenterology guidelines for the diagnosis and management of cholangiocarcinoma. Gut. 2025;74(3):504–505. doi:10.1136/gutjnl-2024-333359
6. Kamsa-Ard S, Luvira V, Suwanrungruang K, et al. Cholangiocarcinoma trends, incidence, and relative survival in khon kaen, thailand from 1989 through 2013: a population-based cancer registry study. J Epidemiol. 2019;29(5):197–204. doi:10.2188/jea.JE20180007
7. Lindnér P, Rizell M, Hafström L. The impact of changed strategies for patients with cholangiocarcinoma in this millenium. HPB Surg. 2015;2015:736049. doi:10.1155/2015/736049
8. Alabraba E, Joshi H, Bird N, et al. Increased multimodality treatment options has improved survival for Hepatocellular carcinoma but poor survival for biliary tract cancers remains unchanged. Eur J Surg Oncol. 2019;45(9):1660–1667. doi:10.1016/j.ejso.2019.04.002
9. Burak K, Angulo P, Pasha TM, Egan K, Petz J, Lindor KD. Incidence and risk factors for cholangiocarcinoma in primary sclerosing cholangitis. Am J Gastroenterol. 2004;99(3):523–526. doi:10.1111/j.1572-0241.2004.04067.x
10. Erichsen R, Jepsen P, Vilstrup H, Ekbom A, Sørensen HT. Incidence and prognosis of cholangiocarcinoma in Danish patients with and without inflammatory bowel disease: a national cohort study, 1978-2003. Eur J Epidemiol. 2009;24(9):513–520. doi:10.1007/s10654-009-9365-4
11. Pedersen N, Duricova D, Elkjaer M, Gamborg M, Munkholm P, Jess T. Risk of extra-intestinal cancer in inflammatory bowel disease: meta-analysis of population-based cohort studies. Am J Gastroenterol. 2010;105(7):1480–1487. doi:10.1038/ajg.2009.760
12. Huai JP, Ding J, Ye XH, Chen YP. Inflammatory bowel disease and risk of cholangiocarcinoma: evidence from a meta-analysis of population-based studies. Asian Pac J Cancer Prev. 2014;15(8):3477–3482. doi:10.7314/apjcp.2014.15.8.3477
13. Kappelman MD, Farkas DK, Long MD, et al. Risk of cancer in patients with inflammatory bowel diseases: a nationwide population-based cohort study with 30 years of follow-up evaluation. Clin Gastroenterol Hepatol. 2014;12(2):265–73.e1. doi:10.1016/j.cgh.2013.03.034
14. Å h E, Erichsen R, Sachs MC, et al. Inflammatory bowel disease and pancreatic cancer: a Scandinavian register-based cohort study 1969-2017. Aliment Pharmacol Ther. 2020;52(1):143–154. doi:10.1111/apt.15785
15. Yu J, Refsum E, Helsingen LM, et al. Risk of hepato-pancreato-biliary cancer is increased by primary sclerosing cholangitis in patients with inflammatory bowel disease: a population-based cohort study. United Eur Gastroenterol J. 2022;10(2):212–224. doi:10.1002/ueg2.12204
16. He Q, Li JD. Epidemiological progress of inflammatory bowel disease. J Pra Med. 2019;35(18):2962–2966. doi:10.3969/j.issn.1006-5725.2019.18.029
17. Wang YF, Ouyang Q, Hu RW, Wen ZH. Advances in study on epidemiology of inflammatory bowel disease. Chin J Gastroenterol. 2013;18(1):48–51. doi:10.3969/j.issn.1008-7125.2013.01.012
18. Qurashi M, Vithayathil M, Khan SA. Epidemiology of cholangiocarcinoma. Eur J Surg Oncol. 2025;51(2):107064. doi:10.1016/j.ejso.2023.107064
19. Greten FR, Grivennikov SI. Inflammation and cancer: triggers, mechanisms, and consequences. Immunity. 2019;51(1):27–41. doi:10.1016/j.immuni.2019.06.025
20. Zhou M, Wang C, Lu S, et al. Tumor-associated macrophages in cholangiocarcinoma: complex interplay and potential therapeutic target. EBioMedicine. 2021;67:103375. doi:10.1016/j.ebiom.2021.103375
21. Zundler S, Becker E, Schulze LL, Neurath MF. Immune cell trafficking and retention in inflammatory bowel disease: mechanistic insights and therapeutic advances. Gut. 2019;68(9):1688–1700. doi:10.1136/gutjnl-2018-317977
22. Shapouri-Moghaddam A, Mohammadian S, Vazini H, et al. Macrophage plasticity, polarization, and function in health and disease. J Cell Physiol. 2018;233(9):6425–6440. doi:10.1002/jcp.26429
23. Funes SC, Rios M, Escobar-Vera J, Kalergis AM. Implications of macrophage polarization in autoimmunity. Immunology. 2018;154(2):186–195. doi:10.1111/imm.12910
24. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinf. 2008;9:559. doi:10.1186/1471-2105-9-559
25. Lange TM, Gültas M, Schmitt AO, Heinrich F. optRF: optimising random forest stability by determining the optimal number of trees. BMC Bioinf. 2025;26(1):95. doi:10.1186/s12859-025-06097-1
26. Rimal Y, Sharma N, Paudel S, Alsadoon A, Koirala MP, Gill S. Comparative analysis of heart disease prediction using logistic regression, SVM, KNN, and random forest with cross-validation for improved accuracy. Sci Rep. 2025;15(1):13444. doi:10.1038/s41598-025-93675-1
27. Qian J, Liu Q, Wang J, Zhuang X, Fang J. Identifying novel biomarkers for biliary tract cancer based on volatile organic compounds analysis and machine learning. Front Oncol. 2025;15:1572460. doi:10.3389/fonc.2025.1572460
28. Dong W, Jiang H, Li Y, et al. Interpretable machine learning analysis of immunoinflammatory biomarkers for predicting CHD among NAFLD patients. Cardiovasc Diabetol. 2025;24(1):263. doi:10.1186/s12933-025-02818-1
29. Yoo M, Shin J, Kim J, et al. DSigDB: drug signatures database for gene set analysis. Bioinformatics. 2015;31(18):3069–3071. doi:10.1093/bioinformatics/btv313
30. Ng SC, Shi HY, Hamidi N, et al. Worldwide incidence and prevalence of inflammatory bowel disease in the 21st century: a systematic review of population-based studies. Lancet. 2017;390(10114):2769–2778. doi:10.1016/s0140-6736(17)32448-0
31. Gordon H, Biancone L, Fiorino G, et al. ECCO guidelines on inflammatory bowel disease and malignancies. J Crohns Colitis. 2023;17(6):827–854. doi:10.1093/ecco-jcc/jjac187
32. van der Vliet A, Danyal K, Heppner DE. Dual oxidase: a novel therapeutic target in allergic disease. Br J Pharmacol. 2018;175(9):1401–1418. doi:10.1111/bph.14158
33. Edens WA, Sharling L, Cheng G, et al. Tyrosine cross-linking of extracellular matrix is catalyzed by Duox, a multidomain oxidase/peroxidase with homology to the phagocyte oxidase subunit gp91 phox. J Cell Biol. 2001;154(4):879–891. doi:10.1083/jcb.200103132
34. Aviello G, Knaus UG. ROS in gastrointestinal inflammation: rescue Or Sabotage? Br J Pharmacol. 2017;174(12):1704–1718. doi:10.1111/bph.13428
35. Hayes P, Dhillon S, O’Neill K, et al. Defects in NADPH Oxidase Genes NOX1 and DUOX2 in very early onset inflammatory bowel disease. Cell Mol Gastroenterol Hepatol. 2015;1(5):489–502. doi:10.1016/j.jcmgh.2015.06.005
36. Parlato M, Charbit-Henrion F, Hayes P, et al. First identification of biallelic inherited DUOX2 inactivating mutations as a cause of very early onset inflammatory bowel disease. Gastroenterology. 2017;153(2):609–611.e3. doi:10.1053/j.gastro.2016.12.053
37. Qi R, Zhou Y, Li X, et al. DUOX2 expression is increased in barrett esophagus and cancerous tissues of stomach and colon. Gastroenterol Res Pract. 2016;2016:1835684. doi:10.1155/2016/1835684
38. Wu Y, Antony S, Hewitt SM, et al. Functional activity and tumor-specific expression of dual oxidase 2 in pancreatic cancer cells and human malignancies characterized with a novel monoclonal antibody. Int J Oncol. 2013;42(4):1229–1238. doi:10.3892/ijo.2013.1821
39. Lu J, Risbood P, Kane CT Jr, et al. Characterization of potent and selective iodonium-class inhibitors of NADPH oxidases. Biochem Pharmacol. 2017;143:25–38. doi:10.1016/j.bcp.2017.07.007
40. Abella V, Scotece M, Conde J, et al. The potential of lipocalin-2/NGAL as biomarker for inflammatory and metabolic diseases. Biomarkers. 2015;20(8):565–571. doi:10.3109/1354750x.2015.1123354
41. Thorsvik S, Van Beelen Granlund A, Svendsen TD, et al. Ulcer-associated cell lineage expresses genes involved in regeneration and is hallmarked by high neutrophil gelatinase-associated lipocalin (NGAL) levels. J Pathol. 2019;248(3):316–325. doi:10.1002/path.5258
42. Nielsen BS, Borregaard N, Bundgaard JR, Timshel S, Sehested M, Kjeldsen L. Induction of NGAL synthesis in epithelial cells of human colorectal neoplasia and inflammatory bowel diseases. Gut. 1996;38(3):414–420. doi:10.1136/gut.38.3.414
43. Chassaing B, Srinivasan G, Delgado MA, Young AN, Gewirtz AT, Vijay-Kumar M. Fecal lipocalin 2, a sensitive and broadly dynamic non-invasive biomarker for intestinal inflammation. PLoS One. 2012;7(9):e44328. doi:10.1371/journal.pone.0044328
44. Candido S, Maestro R, Polesel J, et al. Roles of neutrophil gelatinase-associated lipocalin (NGAL) in human cancer. Oncotarget. 2014;5(6):1576–1594. doi:10.18632/oncotarget.1738
45. Li T, Yu L, Wen J, Liao Q, Liu Z. An early-screening biomarker of endometrial carcinoma: NGAL is associated with epithelio-mesenchymal transition. Oncotarget. 2016;7(52):86064–86074. doi:10.18632/oncotarget.13340
46. Tong Z, Kunnumakkara AB, Wang H, et al. Neutrophil gelatinase-associated lipocalin: a novel suppressor of invasion and angiogenesis in pancreatic cancer. Cancer Res. 2008;68(15):6100–6108. doi:10.1158/0008-5472.Can-08-0540
47. Lee HJ, Lee EK, Lee KJ, Hong SW, Yoon Y, Kim JS. Ectopic expression of neutrophil gelatinase-associated lipocalin suppresses the invasion and liver metastasis of colon cancer cells. Int, J, Cancer. 2006;118(10):2490–2497. doi:10.1002/ijc.21657
48. Santiago-Sánchez GS, Pita-Grisanti V, Quiñones-Díaz B, Gumpper K, Cruz-Monserrate Z, Vivas-Mejía PE. Biological functions and therapeutic potential of lipocalin 2 in cancer. Int J Mol Sci. 2020;21(12). doi:10.3390/ijms21124365
49. Han S, Asoyan A, Rabenstein H, Nakano N, Obst R. Role of antigen persistence and dose for CD4 + T-cell exhaustion and recovery. Proc Natl Acad Sci U S A. 2010;107(47):20453–20458. doi:10.1073/pnas.1008437107
50. Mantovani A, Biswas SK, Galdiero MR, Sica A, Locati M. Macrophage plasticity and polarization in tissue repair and remodelling. J Pathol. 2013;229(2):176–185. doi:10.1002/path.4133
51. Yan L, Wang J, Cai X, et al. Macrophage plasticity: signaling pathways, tissue repair, and regeneration. MedComm. 2024;5(8):e658. doi:10.1002/mco2.658
52. Živalj M, Van Ginderachter JA, Stijlemans B. Lipocalin-2: a nurturer of tumor progression and a novel candidate for targeted cancer therapy. Cancers. 2023;15(21). doi:10.3390/cancers15215159
53. Vermot A, Petit-Härtlein I, Smith SME, Fieschi F. NADPH oxidases (nox): an overview from discovery, molecular mechanisms to physiology and pathology. Antioxidants. 2021;10(6). doi:10.3390/antiox10060890