In this study, we developed 2D/3D CNN models based on NCCT images to predict high-risk rHE in ICH patients and compared its performance with four baseline ML models. The main findings showed that the developed 2D-ResNet-101 model had the optimal predictive performance, demonstrating significant improvement over the BRAIN score and clinical-radiologic model in both the internal- and external-testing sets. Furthermore, it exhibited higher sensitivity and accuracy than the two combined models in the testing sets. These findings suggest that the deep learning model may provide more comprehensive information about hematoma heterogeneity compared to routine clinical predictive indicators and radiomics features alone can, thus more effectively predicting the rHE. This model could allow the identification of patients who may benefit from anti-expansion therapies in the acute ICH settings.
Spontaneous ICH is the deadliest acute stroke type, with high morbidity and mortality25,26. Notably, in real-world clinical scenarios, parenchymal hematomas often extend into the ventricular space27 and the extent of this extension correlates exponentially with patient outcomes28. In the present research, we included IVH expansion in the definition of cHE and explored potential clinical-radiologic factors affecting rHE. Multivariate regression analysis identified significant differences in the onset to baseline CT time interval, ICH volume, and presence of IVH between the groups, with patients who developed rHE showing shorter baseline scan intervals, larger ICH volumes, and a higher likelihood of IVH (Table 2). These findings highlight the urgent need for rapid assessment and intervention to limit ICH growth and improve outcomes, especially for infratentorial hemorrhage. This hemorrhage may disrupt neural pathways related to the Guillain-Morath triangle, a network critical for movement coordination and control, and dysfunction of which can lead to a variety of neurological disorders, such as post-stroke palatal tremor29. According to the 2022 AHA/ASA guidelines, NCCT markers are valuable potential imaging predictors for identifying patients at risk of rHE10. Our analysis showed that hypodensities were the only independent risk factor among the nine NCCT markers, likely indicating areas of incomplete blood clotting prone to instability and further bleeding30,31. Hypodensities also overlap with other NCCT signs32 and their high prevalence may support their role as a predictor. We also developed the BRAIN score and a clinical-radiologic model based on routinely available clinical variables, but these demonstrated limited predictive performance in the testing sets. The sensitivity of these models ranged from 0.350 to 0.488, suggesting a substantial risk of missing rHE diagnoses, which could lead to delayed treatment and potentially serious consequences. These findings highlight the limitations of clinical-radiologic features in predicting rHE, likely due to their qualitative or semiquantitative nature, which can introduce subjectivity and inconsistency in predictions33. This was further evidenced by variability in inter- and intra-observer agreement regarding NCCT markers in this study.
Recent studies have shown promising results using traditional machine learning (ML) methods, including radiomics and deep learning, to predict intracerebral hemorrhage (ICH) growth. Feng and Pszczolkowski et al. applied deep learning radiomics or radiomics features derived from NCCT images to predict cHE, achieving AUCs ranging from 0.693 to 0.82016,34. Xia et al. combined radiomics features with clinical-semantic factors to enhance rHE prediction, achieving an AUC of 0.830 compared to 0.690 for clinical-semantic models alone, though this study had a small sample size35. In our study, with a larger two-center sample, the addition of radiomics features to the clinical-radiologic model improved rHE prediction performance in the external-testing set, consistent with previous findings16,35. However, both combined models exhibited reduced generalizability, likely due to the limited robustness of handcrafted radiomics features, which suffer from low reproducibility across different CT devices and protocols36,37. Furthermore, radiomics features may fail to capture the semantic characteristics of NCCT markers16. In contrast, deep learning automatically learns complex, discriminative features directly from images through neural network layers, eliminating the need for manual extraction of hard-coded features18. Most studies have focused on using deep learning models to predict cHE20,21,38,39,40. In these studies, the follow-up hematoma volume may include both parenchymal hemorrhage and IVH hemorrhage. However, IVH expansion may occur independently of parenchymal hematoma, a factor often overlooked in large dataset studies, such as those by Li20 and Teng40 which limits confidence in deep learning’s ability to predict rHE risk. Our results demonstrate that 2D CNN models based on baseline NCCT images outperform traditional ML models, suggesting that 2D deep learning may significantly enhance predictive accuracy for rHE.
In this study, we developed eight deep learning models to predict rHE, with the 2D-CNN models outperforming the 3D-CNN models in the testing sets. The differences in performance among the different 3D-CNN or 2D-CNN models may be attributed to the differing internal architectures of each network41. Previous studies have demonstrated that 3D images, which contain richer 3D spatial information compared to 2D images42Â typically achieve superior performance in disease prediction tasks43. However, in our study, the 3D-CNN models exhibited limited predictive capability, possibly due to their higher complexity and larger number of parameters, which may not be well-suited for small sample sizes of 3D data43,44. Additionally, the lack of pretrained model weights and the low resolution of 3D NCCT images along the z-axis (5Â mm slice thickness) could have further hindered their performance45,46. Although the 2D-CNN models achieved relatively high performance, their lack of spatial information may hinder accurate modeling of peri-hematomal structures. An approach that balances the advantages of both 2D and 3D modeling may optimize the trade-off between computational efficiency and model generalizability for limited datasets47.
Among the 2D-CNN models in our study, the 2D-ResNet-101, a deep network with 101 layers utilizing residual connections, demonstrated superior predictive performance and improved generalization48. While deeper networks can learn more complex representations, increasing depth does not always lead to better model performance due to challenges in gradient descent49,50,51. This was further supported by our finding that, in most CNN models, greater depth reduced performance on the external-testing set (Table 3). In our study, ResNet outperformed DenseNet, possibly owing to its simpler residual structure and lower memory complexity, which may confer greater robustness under relatively small-sample conditions52,53. Previous studies have shown the effectiveness of deep residual networks in ICH disease classification21,54. Grad-CAM visualizations demonstrated that the 2D-ResNet-101 model primarily focused on the hematoma and its periphery for decision-making, consistent with observations reported by Zhao et al. and Trans et al.21,55. Notably, rHE tends to demonstrate more irregular morphology and internal density heterogeneity compared to non-rHE (Fig. 5A). This peripheral-focused attention pattern may correspond to NCCT markers of active multifocal bleeding, such as irregular shape56 (Fig. 5A, Case2). These findings may support for Fisher’s ‘avalanche model’ of HE, which proposes that initial bleeding disrupts adjacent vessels, leading to surrounding secondary hemorrhage57. Furthermore, the 2D-ResNet-101 model achieved significantly higher sensitivity than the baseline models, without significant decrease in specificity, indicating that a higher proportion of ICH patients at high risk for rHE can be identified early, thereby helping to ensure that these patients receive timely, early-stage anti-expansion treatments or surgical intervention, as needed.
This study has several limitations. First, due to its retrospective design, some important clinical parameters such as Glasgow Coma Scale scores were unavailable. Therefore, a prospective study is necessary to validate the deep learning model’s performance and further explore the relationship between rHE and clinical variables. Second, the relatively small sample size limits the generalizability of the findings. A multi-center trial with larger datasets is essential to assess the model’s applicability in real-world clinical settings. Third, while the current standard for rHE relies on semiautomatic delineation software with manual adjustment, detecting small volume changes, particularly in IVH expansion (≥ 1 mL), can be challenging due to technological limitations. Implementing fully automated, high-precision IVH delineation may enhance accuracy and reduce human error. Finally, the developed deep learning models primarily focused on image-based predictions without incorporating clinical-radiologic variables. However, medical decisions are multifactorial and not solely based on imaging findings. Future research should aim to integrate these variables to further improve model performance.