PCa diagnosis remains challenging, particularly in differentiating it from BPH, as both conditions share overlapping clinical features. The reliance on PSA tests has led to high false-positive rates, resulting in unnecessary biopsies and significant patient anxiety35,36. These limitations highlight the urgent need for reliable, non-invasive biomarkers capable of accurately distinguishing between PCa and BPH.
Integration of miRNA profiling and machine learning
miRNAs are promising candidates for non-invasive diagnostics due to their stability in circulation and ability to reflect tumor biology. Most studies on miRNA biomarkers for PCa have primarily examined serum and plasma37, with limited available research on whole blood. However, promising results have been reported for other cancers such as breast, pancreatic, and lung cancers using whole blood-based miRNA profiling. Our study utilized miRNAs previously identified in studies and evaluated their diagnostic potential in whole blood. Whole blood offers key advantages, including higher miRNA yield and robust systemic representation of disease states, making it a valuable biofluid for biomarker discovery. The complexity of whole blood, which contains miRNAs from multiple cellular sources, could possibly introduce noise. To improve standardization and reproducibility, future studies should systematically compare miRNA expression across different biofluids, ensuring consistency in diagnostic applications. However, the ensemble-based random forest method used in the study mitigates this challenge by handling non-linear relationships and reducing sensitivity to noise38. Further, the fivefold validation proves generalizability of the proposed model, supporting that the model is able to handle unseen data and avoid overfitting. This study leverages a novel combination of miRNA profiling and ML to enhance diagnostic precision for PCa. While individual miRNAs, such as miR-21-5p, miR-141-3p, and miR-221-3p, have been implicated in PCa progression in previous studies39,40,41, the current work innovates by exploring the application of machine learning tools and miRNA’s expression profile showing that use of ML demonstrated superior discriminatory power in distinguishing PCa from BPH, capturing synergistic effects overlooked in standalone miRNA analyses and linear models respectively.
The random forest classifier was chosen in this study due to its ability to capture non-linear relationships and complex feature interactions, leading to an AUC-ROC score of 0.78. The results demonstrated that our machine learning model outperformed PSA, which suffers from high false-positive rates, achieving a higher AUC-ROC score and offering greater net benefit across a range of threshold probabilities in DCA. Unlike models reliant on fixed Ct-value thresholds, our ML approach dynamically adjusts to data variability, improving sensitivity and specificity. These findings suggest that miRNA-based diagnostics, when integrated with ML approaches, could provide a more accurate and clinically relevant tool for PCa detection and risk stratification, reducing unnecessary biopsies while maintaining high sensitivity.
Biological interpretation of miRNA findings
To address concerns regarding the “black box” nature of ML models, we incorporated feature importance rankings and bioinformatics analyses to validate the biological relevance of the key features identified by the model. The miRNA ratios miR-141-3p/miR-221-3p and miR-21-5p/miR-141-3p were confirmed as critical features for distinguishing PCa from BPH.
KEGG pathway enrichment analysis linked miR-21-5p to cancer-related pathways, including PD-L1/PD-1 checkpoint regulation, prolactin signaling, HIF-1, and NF-κB signaling, all of which play crucial roles in immune evasion, angiogenesis, and inflammation. In contrast, miR-141-3p and miR-221-3p were associated with androgen receptor (AR) signaling and endocrine resistance, which are critical pathways in hormone-sensitive and castration-resistant PCa. These findings suggest a potential regulatory role of these miRNAs in PCa progression, but further functional validation is required to confirm their direct involvement in tumor development and progression. Interestingly, target gene analysis revealed both oncogenes and tumor suppressors within the hub gene network, including: EPHA2, CBX8, STAT3, SMAD2 (context-dependent), TNFAIP1 as oncogenes and RASA1, RHOB, CDKN1B, ARID1A, OGT, CBX4, PTEN, FOS (context-dependent) as tumor suppressors. To better understand how these hub genes influence tumor progression and response to therapy, future research should focus on longitudinal expression studies and functional assays. The coexistence of both oncogenes and tumor suppressors in the hub gene network may initially appear counterintuitive. However, this reflects the complex regulatory interactions within cancer biology, where genes can have dual roles depending on cellular context, mutation status, and signaling interactions. These findings also emphasize the complex regulatory landscape of miRNA expression in PCa and suggest that miRNA profiling in exosomal fractions or immune cell subsets may provide deeper insights.
Limitations and future directions
While our model demonstrated the ability to generalize, an essential requirement for real-world clinical applications, several limitations must be acknowledged. The study’s findings are based on a limited cohort predominantly from a single population, necessitating further validation across diverse genetic, environmental, and clinical settings. In the future we are focused on large-scale, multi-center study to validate the model across various populations. Other models, such as XGBoost, support vector machines (SVM), and deep learning, could be explored for scalability and improved predictive power in larger datasets. However, such deep learning methods typically require larger training datasets and extensive computational resources, which were beyond the scope of this study.
One of the major challenges in translating miRNA-based diagnostics into clinical practice is the lack of standardized protocols. Variations in sample processing methods, RT-PCR platforms, cut-off values, and reference genes can introduce inconsistencies that hinder cross-study validation. Standardization efforts, including unified Ct value normalization methods and consensus guidelines for miRNA biomarker validation, are crucial to improving reproducibility and clinical utility. Future work should focus on establishing standardized protocols for reproducibility across different platforms.
To further strengthen the external validation of our findings, future studies should consider leveraging publicly available datasets. These datasets provide valuable large-scale transcriptomic data across diverse patient populations and could help assess the generalizability of our model. However, integrating such datasets presents challenges, as they often involve heterogeneous sample types (e.g., plasma, serum, urine) and different profiling platforms (e.g., RNA sequencing, microarrays, RT-PCR), leading to technical variability. Addressing these discrepancies would require robust normalization strategies and cross-platform data harmonization to ensure comparability with the current model. A research effort focused on developing computational approaches for cross-platform normalization strategies could be highly valuable.
The integration of miRNA-based ML models with multiparametric MRI (mpMRI) is also a promising avenue for enhancing PCa diagnosis and risk stratification. While mpMRI is widely used to assess prostate lesions and guide biopsies, its accuracy can be limited by inter-reader variability and false-positive findings. Combining molecular biomarkers, such as miRNA signatures, with radiological features (e.g., lesion morphology, diffusion-weighted imaging parameters) could improve diagnostic precision.
In Conclusion, The integration of miRNA profiling with ML offers a promising approach for improving PCa diagnostics. By leveraging miRNA expression ratios and ensemble-based models, this study demonstrated enhanced diagnostic accuracy, surpassing traditional PSA-based approaches. The biological validation of key miRNA biomarkers supports their clinical potential, while model validation underscores its reliability. Future research focused on large-scale validation, standardization, and multimodal integration will be crucial in advancing this approach toward clinical implementation.