{"id":3582,"date":"2025-10-14T09:38:19","date_gmt":"2025-10-14T09:38:19","guid":{"rendered":"https:\/\/www.newsbeep.com\/us-ca\/3582\/"},"modified":"2025-10-14T09:38:19","modified_gmt":"2025-10-14T09:38:19","slug":"fire-risk-to-structures-in-californias-wildland-urban-interface","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/us-ca\/3582\/","title":{"rendered":"Fire risk to structures in California\u2019s Wildland-Urban Interface"},"content":{"rendered":"<p>We primarily relied on a modified database from five selected fires that includes more than 47,000 structures with two broad damage states: \u201cSurvived\u201d and \u201cDestroyed\u201d, and five detailed damage states: \u201cDestroyed (&gt;50%)\u201d, Damaged (\u201cMajor (26\u201350%)\u201d, \u201cMinor (10\u201325%)\u201d, \u201cAffected (1\u20139%)\u201d), \u201cNo Damage\u201d. The CAL FIRE Damage INSpection Program (DINS) was founded with the goal to collect data on damaged, destroyed, and unburned structures during and immediately after fire events to assist in the recovery process, and to provide local governments and scientists information for analyzing why some structures burned and why some survived<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 43\" title=\"Henning, Andrew, Cox, Jonathan, &amp; Shew, David. CAL FIRE&#x2019;s Damage Inspection Program&#x2014;Its Evolution and Implementation. &#010;                  http:\/\/www.fltwood.com\/perm\/nfpa-2016\/scripts\/sessions\/M26.html&#010;                  &#010;                 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR43\" id=\"ref-link-section-d643601119e1944\" rel=\"nofollow noopener\" target=\"_blank\">43<\/a>. Through a public records request, we acquired DINS data for more than 90,000 structures that survived, were damaged, or were destroyed across all California wildfires from 2013\u20132022, making this potentially the largest combined dataset of its sort. We then incorporated risk factors associated with structure destruction by wildfires to the DINS data to gain a deeper understanding of WUI destruction. These factors include structure density, building materials, year built, defensible space, and exposures to structures (fire intensity and ember). We employed several Machine Learning (ML) techniques to identify and highlight the important features in our WUI data. These techniques included feature selection, feature engineering, and model interpretation methods to ensure we could pinpoint the most influential variables influencing our results. To enhance the performance of the ML model in this study, we implemented a range of data preprocessing techniques such as data cleaning, normalization, and encoding. These preprocessing steps were crucial for improving model accuracy, reducing noise, and ensuring the robustness of our findings. By meticulously preparing the data, we ensured that the ML model could effectively learn and make accurate predictions from our complex WUI dataset. We opted for the XGBoost (eXtreme Gradient Boosting) algorithm for our ML model due to its superior performance over other methods on our dataset. We also leveraged the SHAP (SHapley Additive exPlanations) model, which provides a nuanced understanding of each column\u2019s contribution to the overall predictive outcome. This technique allowed for a comprehensive assessment of the importance of variables within the dataset, enhancing the robustness and reliability of our analysis. The results of Confusion Matrices and Receiver operating characteristics (ROC) Curves, in addition to an advanced computational framework, allowed us to delve into the intricacies of the dataset, capturing complex relationships and patterns that might not be discernible through conventional methods. Our evaluation extended beyond a generalized assessment, as we calculated the accuracy and sensitivity metrics for each individual fire and aggregated the results to encompass all structures within the damage dataset. This meticulous analysis not only provided insights into the predictive performance of our model on a per-fire basis but also yielded a comprehensive understanding of its effectiveness across the entire spectrum of structures in the damage data.<\/p>\n<p>Risk factors to structures from wildfires in the\u00a0WUI<\/p>\n<p>The methodology for integrating risk factors related to structure destruction builds upon the combination of on-the-ground data with fire modeling reconstructions by Hakes and Theodori et al.<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 34\" title=\"Hakes, R. S. P., Theodori, M., Lautenberger, C., Qian, L. &amp; Gollner, M. J. Community-level risk assessment of structure vulnerability to WUI fire conditions in the 2017 Tubbs Fire. in Advances in Forest Fire Research 2022 (Eds. Domingos Xavier Viegas &amp; Luis Mario Ribeiro) 552&#x2013;557 (Imprensa da Universidade de Coimbra, 2022).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR34\" id=\"ref-link-section-d643601119e1955\" rel=\"nofollow noopener\" target=\"_blank\">34<\/a> for community-level risk assessment for the Tubbs fire, which includes:<\/p>\n<p>Structure spacing which represents \u201cStructure Separation Distance (SSD)\u201d. We employed the Microsoft Maps dataset (available at <a href=\"https:\/\/github.com\/microsoft\/USBuildingFootprints\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/github.com\/microsoft\/USBuildingFootprints<\/a>), which encompasses open building footprints datasets for entire counties in the United States. This dataset comprises 129,591,852 computer-generated building footprints. Additionally, we utilized QGIS software to access geospatial data concerning urban infrastructure, building locations, and their spatial interconnections.<\/p>\n<p>The year built refers to the year in which the primary structure on a parcel of land was constructed. In the context of analyzing the impact of WUI fires, the Year Built variable is important because the age of a structure can influence its susceptibility to fire damage. Furthermore, it acts as a confounding variable that can affect both the building features and the extent of damage.<\/p>\n<p>Concerning fire safety in building construction materials, numerous in-depth studies have been carried out through meticulously planned laboratory tests<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 18\" title=\"Quarles, S. L., Valachovic, Y., Nakamura, G. M., Nader, G. A. &amp; De Lasaux, M. J. Home Survival in Wildfire-Prone Areas: Building Materials and Design Considerations (University of California, Agriculture and Natural Resources, 2010).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR18\" id=\"ref-link-section-d643601119e1983\" rel=\"nofollow noopener\" target=\"_blank\">18<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 44\" title=\"Manzello, S. L., Suzuki, S. &amp; Hayashi, Y. Exposing siding treatments, walls fitted with eaves, and glazing assemblies to firebrand showers. Fire Saf. J. 50, 25&#x2013;34 (2012).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR44\" id=\"ref-link-section-d643601119e1986\" rel=\"nofollow noopener\" target=\"_blank\">44<\/a>. Despite the solid laboratory evidence, few empirical studies have documented building characteristics associated with structure loss in real wildfire situations<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 28\" title=\"Syphard, A. &amp; Keeley, J. Factors associated with structure loss in the 2013&#x2013;2018 California wildfires. Fire 2, 49 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR28\" id=\"ref-link-section-d643601119e1990\" rel=\"nofollow noopener\" target=\"_blank\">28<\/a>. In this study building characteristics include eaves, vent screens, exterior siding, roof construction, and window panes.<\/p>\n<p>In terms of defensible space, which is representing in this study as \u201cVegetation Separation Distance (VSD)\u201d, the state of California requires fire-exposed homeowners to create a minimum of 30\u2009m (100\u2009ft) of defensible space around structures, and some localities are beginning to require at least 60\u2009m (200\u2009ft) in certain circumstances<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 26\" title=\"Syphard, A. D., Brennan, T. J. &amp; Keeley, J. E. The role of defensible space for residential structure protection during wildfires. Int. J. Wildland Fire 23, 1165 (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR26\" id=\"ref-link-section-d643601119e2001\" rel=\"nofollow noopener\" target=\"_blank\">26<\/a>. We established three categories for the Vegetation Separation Distance (VSD): Zone0, which comprises the initial five feet from the building or \u201c0\u20135\u201d; Zone1, encompassing the area within 30 feet of the building or \u201c5\u201330\u201d; and Zone2, extending to within 100 feet of the building or \u201c30\u2013100\u201d (CAL FIRE DSpace: <a href=\"https:\/\/www.fire.ca.gov\/dspace\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/www.fire.ca.gov\/dspace<\/a>). Remote sensing techniques were utilized to analyze the density and distribution of vegetation in the WUI regions and urban settings, extracting valuable insights from the aerial and satellite imagery and LiDAR data. The publicly available datasets (including countywide LiDAR data and a fine scale vegetation and habitat map) which\u00a0were produced by the Sonoma County Agricultural Preservation and Open Space District and the Sonoma County Water Agency, provide an accurate, up-to-date inventory of the county\u2019s landscape features, ecological communities and habitats (Sonoma County Vegetation Map: <a href=\"https:\/\/sonomavegmap.org\/\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/sonomavegmap.org\/<\/a>).<\/p>\n<p>Exposures including fire intensity (flame length) and firebrand (ember load). Houses are destroyed during wildfires when exposed to flames in adjacent fuel, radiant heat from nearby fuel (\u226440\u2009m)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 16\" title=\"Cohen, J. ackD. Preventing disaster: home ignitability in the wildland-urban interface. J. For. 98, 15&#x2013;21 (2000).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR16\" id=\"ref-link-section-d643601119e2024\" rel=\"nofollow noopener\" target=\"_blank\">16<\/a>, or airborne embers and firebrands originating in nearby and distant fuel (typically\u2009&lt;\u200910\u2009km)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 45\" title=\"Koo, E., Pagni, P. J., Weise, D. R. &amp; Woycheese, J. P. Firebrands and spotting ignition in large-scale fires. Int. J. Wildland Fire 19, 818 (2010).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR45\" id=\"ref-link-section-d643601119e2028\" rel=\"nofollow noopener\" target=\"_blank\">45<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 46\" title=\"Noble, I. R., Gill, A. M. &amp; Bary, G. A. V. McArthur&#x2019;s fire-danger meters expressed as equations. Austral Ecol. 5, 201&#x2013;203 (1980).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR46\" id=\"ref-link-section-d643601119e2031\" rel=\"nofollow noopener\" target=\"_blank\">46<\/a>. In this study, we used the Eulerian Level set Model of FIRE spread, ELMFIRE, an operational fire behavior and spread simulation tool<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 35\" title=\"Lautenberger, C. Wildland fire modeling with an Eulerian level set method and automated calibration. Fire Saf. J. 62, 289&#x2013;298 (2013).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR35\" id=\"ref-link-section-d643601119e2035\" rel=\"nofollow noopener\" target=\"_blank\">35<\/a> for its additional capability in simulating ember deposition of multiple embers and its implementation of Monte Carlo analysis<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 36\" title=\"Lautenberger, C. Mapping areas at elevated risk of large-scale structure loss using Monte Carlo simulation and wildland fire modeling. Fire Saf. J. 91, 768&#x2013;775 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR36\" id=\"ref-link-section-d643601119e2039\" rel=\"nofollow noopener\" target=\"_blank\">36<\/a> to capture the stochasticity and uncertainty inherent in wildland fire modeling. We used and modified the semi-physical model of\u2006<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 36\" title=\"Lautenberger, C. Mapping areas at elevated risk of large-scale structure loss using Monte Carlo simulation and wildland fire modeling. Fire Saf. J. 91, 768&#x2013;775 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR36\" id=\"ref-link-section-d643601119e2043\" rel=\"nofollow noopener\" target=\"_blank\">36<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 47\" title=\"Purnomo, D. M. J. et al. Reconstructing modes of destruction in wildland&#x2013;urban interface fires using a semi-physical level-set model. Proc. Combust. Inst. 40, 105755 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR47\" id=\"ref-link-section-d643601119e2046\" rel=\"nofollow noopener\" target=\"_blank\">47<\/a> to include urban fire spread by using the empirical approach of HAMADA<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 37\" title=\"Hamada, M. On the rate of fire spread, Non-Life Insurance Rating Organization of Japan. Disaster Research 1, 35&#x2013;44 (1951).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR37\" id=\"ref-link-section-d643601119e2051\" rel=\"nofollow noopener\" target=\"_blank\">37<\/a>.<\/p>\n<p>Data preprocessing<\/p>\n<p>To predict the damage for any of the fire datasets, the dataset was divided into the target variable or y, and all the other features as inputs or X. A stratified split was executed based on \u201cy\u201d values, allocating 80% of the data for training purposes and reserving the remaining 20% for the testing set. This stratified approach ensured that the class proportions in the target variable were similar in both subsets, minimizing the risk of bias due to imbalanced classes. By preserving the target class distribution, this partitioning strategy not only improved the model\u2019s ability to generalize but also provided a more accurate and reliable performance evaluation when tested on unseen data. Additionally, the use of a fixed random_state ensured that the split was reproducible, allowing for consistent model training and evaluation across different iterations. As part of the model training process, we utilized GridSearchCV for hyperparameter tuning across several models, including Logistic Regression, Random Forest, and XGBoost. During the grid search, k-fold cross-validation (with cv_k_folds set to 10) was employed to evaluate the models, ensuring robust validation and mitigating overfitting. In the cross-validation process, the data was split into k-folds, where each fold served as the validation set once, while the remaining k-1 folds were used for training. This allowed the grid search to identify the optimal set of hyperparameters based on performance metrics, such as accuracy and F-beta scores. After selecting the best hyperparameters, the model was refitted on the entire training set, ensuring that the final model was well-tuned for testing.<\/p>\n<p>To address the noteworthy variations in the scales of the model inputs, a vital preprocessing step was implemented prior to model training. Using the scikit-learn package<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 48\" title=\"Pedregosa, F. et al. Scikit-learn. Mach. Learn. Python &#010;                  https:\/\/doi.org\/10.48550\/ARXIV.1201.0490&#010;                  &#010;                 (2012).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR48\" id=\"ref-link-section-d643601119e2066\" rel=\"nofollow noopener\" target=\"_blank\">48<\/a>, we first designed imputation strategies through IterativeImputer to handle missing values. These strategies were trained on the training set and then applied to both the training and test sets. The imputation strategy was tailored for each feature in stacked WUI data and for each wildfire case. For example, Roof Construction (19,318 non-null), Eaves (19,318 non-null), Vent Screen (19,318 non-null), Exterior Siding (19,318 non-null), Window Pane (19,318 non-null), VSD (3504 non-null), and Year Built (22,501 non-null) were imputed using a nearest neighbor approach. For Year Built in individual fire cases, either nearest neighbor imputation or a median-based strategy was adopted, whereas numerical features like Embers (11,549 non-null), and Flame length (14,578 non-null) were aggregated (e.g., using the mean or median, potentially augmented by k-nearest neighbors) to fill in missing values. In our approach, we incorporated a spatial clustering technique that utilizes proximity-based methods for data imputation. Specifically, we leveraged Haversine Distance and Pairwise Distance metrics in UTM coordinates to cluster data points based on their geographic proximity. This spatial clustering approach ensures that similar locations, defined by latitude and longitude, are treated consistently when imputing missing values. By considering spatial proximity, we make the assumption that nearby data points are likely to share similar attributes, enhancing the robustness of the imputation process. Next, we normalized the numerical variables using StandardScaler, ensuring that they were on a similar scale, which helps in the convergence and performance of various models. Additionally, we conducted OneHotEncoding and Label Encoding on categorical variables using OneHotEncoder and LabelEncoder from scikit-learn to convert them into a numerical format that can be understood by the models. Class balance is achieved through the binarization of different labels\/classes with damaged and not damaged\/survived. This approach is essential, particularly in scenarios where certain damage classes may be underrepresented. This preprocessing pipeline allowed us to use a variety of models on the dataset, ensuring compatibility and enhancing the overall performance of the models.<\/p>\n<p>In essence, this procedure, encompassing data categorization, stratified splitting, imputation, standard scaling, OneHotEncoding\/Label Encoding, and resampling, laid the foundation for a robust and unbiased evaluation of the model\u2019s predictive capabilities regarding fire damage across diverse datasets.<\/p>\n<p>Machine learning techniques<\/p>\n<p>Machine learning (ML) methods have recently been applied to wildland fire<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 49\" title=\"Jain, P. et al. A review of machine learning applications in wildfire science and management. Environ. Rev. 28, 478&#x2013;505 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR49\" id=\"ref-link-section-d643601119e2081\" rel=\"nofollow noopener\" target=\"_blank\">49<\/a> and present an ideal platform for WUI fires as interactions between competing factors can be fit and modeled. In this work, we employed both regression and classification ML techniques to our combined dataset resulting in a predictive model for structure destruction based on home hardening (roof, siding, vents, eaves, window, year built), vegetation separation (defensible space and surrounding), exposure metrics (flames and embers), and structure spacing. The XGBoost (eXtreme Gradient Boosting) machine learning algorithm was chosen as it outperformed other methods on our dataset. The model hyper parameters were tuned using RandomizedSearchCV, which was employed to perform a randomized search over a predefined parameter grid. This approach was used because of the large number of parameters in the XGBoost model. Hyper parameter selection is performed using the best result in terms of the following classification metrics: F-beta<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 50\" title=\"Sokolova, M. &amp; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45, 427&#x2013;437 (2009).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR50\" id=\"ref-link-section-d643601119e2085\" rel=\"nofollow noopener\" target=\"_blank\">50<\/a>, F1-Score<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 50\" title=\"Sokolova, M. &amp; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45, 427&#x2013;437 (2009).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR50\" id=\"ref-link-section-d643601119e2089\" rel=\"nofollow noopener\" target=\"_blank\">50<\/a>, accuracy<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 51\" title=\"Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proc. 14th Int. Jt. Conf. Artif. Intell. 14, 1137&#x2013;1143 (International Joint Conference on Artificial Intelligence, 1995).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR51\" id=\"ref-link-section-d643601119e2093\" rel=\"nofollow noopener\" target=\"_blank\">51<\/a>, balanced accuracy<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 52\" title=\"Brodersen, K. H., Ong, C. S., Stephan, K. E. &amp; Buhmann, J. M. The balanced accuracy and its posterior distribution. In 2010 20th International Conference on Pattern Recognition 3121&#x2013;3124 (IEEE, 2010).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR52\" id=\"ref-link-section-d643601119e2097\" rel=\"nofollow noopener\" target=\"_blank\">52<\/a> and precision-recall scores<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 53\" title=\"Davis, J. &amp; Goadrich, M. The relationship between Precision-Recall and ROC curves. In Proc. 23rd International Conference on Machine Learning&#x2014;ICML &#x2019;06 233&#x2013;240 (ACM Press, 2006).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR53\" id=\"ref-link-section-d643601119e2102\" rel=\"nofollow noopener\" target=\"_blank\">53<\/a>. The F-beta score is used to balance precision and recall, with the beta parameter allowing for tuning the model\u2019s sensitivity to false positives and false negatives. Finally, feature importance with SHAP aggregation analysis was utilized to quantify the contribution of each feature to the target variable. A higher feature importance score indicates that the feature has a greater influence on the model\u2019s prediction<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 54\" title=\"Mas&#xED;s, S. Interpretable Machine Learning with Python: Build Explainable, Fair, and Robust High-Performance Models with Hands-on, Real-World Examples (Packt, 2023).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR54\" id=\"ref-link-section-d643601119e2106\" rel=\"nofollow noopener\" target=\"_blank\">54<\/a>. The SHAP model connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 55\" title=\"Lundberg, S. &amp; Lee, S.-I. A unified approach to interpreting model predictions. &#010;                  https:\/\/doi.org\/10.48550\/ARXIV.1705.07874&#010;                  &#010;                 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR55\" id=\"ref-link-section-d643601119e2110\" rel=\"nofollow noopener\" target=\"_blank\">55<\/a>. This was then applied to a unified framework for interpreting predictions to explain the output of any machine learning model.<\/p>\n<p>Classifiers<\/p>\n<p>We employed several classification models, including Logistic Regression and Random Forest<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 48\" title=\"Pedregosa, F. et al. Scikit-learn. Mach. Learn. Python &#010;                  https:\/\/doi.org\/10.48550\/ARXIV.1201.0490&#010;                  &#010;                 (2012).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR48\" id=\"ref-link-section-d643601119e2123\" rel=\"nofollow noopener\" target=\"_blank\">48<\/a>, and Gradient Boosting based XGBoost<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 56\" title=\"Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. &amp; Gulin, A. CatBoost: unbiased boosting with categorical features. Preprint at &#010;                  https:\/\/doi.org\/10.48550\/ARXIV.1706.09516&#010;                  &#010;                 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR56\" id=\"ref-link-section-d643601119e2127\" rel=\"nofollow noopener\" target=\"_blank\">56<\/a> since there is another method called Gradient Boosting Machine other than Extreme Gradient Boosting Machines (XGBoost). Each of these models offers distinct advantages and methodologies for analyzing feature importance.<\/p>\n<p>Logistic Regression is a generalized linear model used for classification problems<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 57\" title=\"Bishop, C. M. Pattern Recognition and Machine Learning, Vol. 2 (Springer, 2006).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR57\" id=\"ref-link-section-d643601119e2134\" rel=\"nofollow noopener\" target=\"_blank\">57<\/a> and we use it as a base model to compare with more complex models. The second model used in this work is the Random Forest. Random Forests are a technique in ensemble learning utilized for tasks such as classification and regression. During the training, several decision trees are built. In classification, the random forest outputs the class chosen by the majority of trees<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 58\" title=\"Ho, T. K. Random decision forests. In Proc. 3rd International Conference on Document Analysis and Recognition, Vol. 1 278&#x2013;282 (IEEE, 1995). &#010;                  https:\/\/doi.org\/10.1109\/ICDAR.1995.598994&#010;                  &#010;                .\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR58\" id=\"ref-link-section-d643601119e2138\" rel=\"nofollow noopener\" target=\"_blank\">58<\/a>. CatBoost employs an ordered boosting technique to minimize target leakage from categorical features, often leading to robust performance even with limited parameter tuning<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 56\" title=\"Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. &amp; Gulin, A. CatBoost: unbiased boosting with categorical features. Preprint at &#010;                  https:\/\/doi.org\/10.48550\/ARXIV.1706.09516&#010;                  &#010;                 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR56\" id=\"ref-link-section-d643601119e2142\" rel=\"nofollow noopener\" target=\"_blank\">56<\/a>. While CatBoost can seamlessly integrate categorical data with minimal preprocessing and achieve competitive performance on binary classification tasks, logistic regression, random forest, and XGBoost typically require more elaborate feature engineering and preprocessing, which in turn can influence both model performance and the interpretability of sensitivity analyses such as those based on SHAP values. Finally, Gradient Boosting (GB) is a method in machine learning that employs boosting within a functional framework. The XGBoost (eXtreme Gradient Boosting) is a GB implementation that has been used as it outperformed other methods on our dataset. XGBoost is often preferable for developing predictive models for large datasets due to its accuracy, efficiency, and adaptability<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 38\" title=\"Chen, T. &amp; Guestrin, C. XGBoost: A Scalable Tree Boosting System.&#010;                  https:\/\/doi.org\/10.48550\/ARXIV.1603.02754&#010;                  &#010;                 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR38\" id=\"ref-link-section-d643601119e2146\" rel=\"nofollow noopener\" target=\"_blank\">38<\/a>. Furthermore, XGBoost is a robust algorithm for both classification and regression problems. Due to its strengths in model prediction, XGBoost can be utilized for damage assessment to create predictive models for structure destruction. The SHAP analysis results for all four models are provided in the Supplementary Materials (Supplementary Figs.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#MOESM1\" rel=\"nofollow noopener\" target=\"_blank\">1<\/a>\u2013<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#MOESM2\" rel=\"nofollow noopener\" target=\"_blank\">3<\/a>). These figures offer a detailed breakdown of how each feature contributes to the predictions across models, enhancing the interpretability of our findings and complementing the results discussed in the main text.<\/p>\n<p>Feature contribution through SHAP analysis<\/p>\n<p>While machine learning (ML) models are increasingly used due to their high predictive power, their use in understanding the data-generating process (DGP) is limited. Understanding the DGP requires insights into feature-target associations, which many ML models cannot directly provide, due to their lack of understanding causal effects. Feature importance (FI) methods provide useful insights into the DGP under certain conditions<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 59\" title=\"Ewald, F. K. et al. A guide to feature importance methods for scientific inference. &#010;                  https:\/\/doi.org\/10.48550\/ARXIV.2404.12862&#010;                  &#010;                 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR59\" id=\"ref-link-section-d643601119e2165\" rel=\"nofollow noopener\" target=\"_blank\">59<\/a>. Furthermore, SHAP (SHapley Additive exPlanations) is a unified framework for interpreting machine learning models based on cooperative game theory<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 55\" title=\"Lundberg, S. &amp; Lee, S.-I. A unified approach to interpreting model predictions. &#010;                  https:\/\/doi.org\/10.48550\/ARXIV.1705.07874&#010;                  &#010;                 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR55\" id=\"ref-link-section-d643601119e2169\" rel=\"nofollow noopener\" target=\"_blank\">55<\/a>. It assigns each feature an importance value for a particular prediction by computing the contribution of each feature to the prediction, averaging over all possible combinations of features. This approach ensures consistency and local accuracy, providing insights into how different features influence model predictions. SHAP values can explain individual predictions and provide a global understanding of the model\u2019s behavior, making it a valuable tool for model interpretability in research<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 54\" title=\"Mas&#xED;s, S. Interpretable Machine Learning with Python: Build Explainable, Fair, and Robust High-Performance Models with Hands-on, Real-World Examples (Packt, 2023).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR54\" id=\"ref-link-section-d643601119e2173\" rel=\"nofollow noopener\" target=\"_blank\">54<\/a>. SHAP can be considered a form of in-sample sensitivity analysis because it assesses how changing a feature or a subset of features affects the model\u2019s output. It evaluates the impact of including or excluding a feature and identifies which features contribute most to the predictions<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 60\" title=\"Borgonovo, E., Plischke, E. &amp; Rabitti, G. The many Shapley values for explainable artificial intelligence: a sensitivity analysis perspective. Eur. J. Oper. Res. 318, 911&#x2013;926 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR60\" id=\"ref-link-section-d643601119e2177\" rel=\"nofollow noopener\" target=\"_blank\">60<\/a>. We utilized SHAP interpretation analysis of feature importance to identify and understand the key factors driving structure destruction in WUI fires. In this study, we opted for SHAP (SHapley Additive exPlanations) as a model-agnostic tool because its values not only quantify the magnitude and direction of each feature\u2019s contribution, but also capture complex non-linear interactions between variables<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 55\" title=\"Lundberg, S. &amp; Lee, S.-I. A unified approach to interpreting model predictions. &#010;                  https:\/\/doi.org\/10.48550\/ARXIV.1705.07874&#010;                  &#010;                 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR55\" id=\"ref-link-section-d643601119e2181\" rel=\"nofollow noopener\" target=\"_blank\">55<\/a>. This provides both local and global insights that are critical for understanding the multifaceted nature of fire damage. For example, SHAP allowed us to reveal how features such as SSD, ember exposure, and flame length interact in non-linear ways that traditional importance measures might overlook. Ultimately, the detailed and context-specific information provided by SHAP helped us interpret the predictive factors driving structural vulnerability, reinforcing the robustness of our findings.<\/p>\n<p>Sensitivity analysis for the\u00a0machine learning model<\/p>\n<p>We developed a comprehensive sensitivity analysis framework to assess how variability in key input features from the exposure model (ember load and flame length) affects our model predictions. For each of the five fires, ensemble outputs from the WUI fire spread model were used to perturb the \u201cember load\u201d and \u201cflame length\u201d variables while keeping other inputs fixed. By aggregating the model outputs from these multiple ensemble runs, we computed the mean predictions and corresponding uncertainties for each test sample. This approach allowed us to quantify the impact of non-linear interactions and input variability on the final predictions, offering both local and global insights into model performance.<\/p>\n<p>Visualizations, such as kernel density estimation (KDE) plots, clearly illustrate the distribution and variability of the predictions across the test samples (Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#Fig8\" rel=\"nofollow noopener\" target=\"_blank\">8<\/a>). The shaded regions represent the uncertainty around the mean predictions for both ember and flame perturbations, with the respective overall mean, standard deviation, and relative uncertainty values indicated within the plots. These distributions provide a clear view of the uncertainty and variability in the model\u2019s response to perturbations in ember load and flame length. Additionally, SHAP analysis was employed to further interpret the contributions of each feature, enhancing our understanding of the model\u2019s behavior under different exposure conditions. This sensitivity analysis not only characterizes the associated uncertainties related to flame and ember in the model but also suggests that the machine learning estimator, XGBoost, has learned an underlying understanding of the problem implying intermediary outcomes other than damaged and survived are possible in the dataset; see the emerged middle class distributions in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#Fig8\" rel=\"nofollow noopener\" target=\"_blank\">8<\/a>. Additionally, it helped us gain insights into the physical factors influencing damage, as it highlights the non-binary classifications for the damage classes, offering a more nuanced understanding of the damage severity.<\/p>\n<p>Fig. 8: Sensitivity analysis with respect to ember and flame exposure with perturbations.<a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41467-025-63386-2\/figures\/8\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig8\" src=\"https:\/\/www.newsbeep.com\/us-ca\/wp-content\/uploads\/2025\/10\/41467_2025_63386_Fig8_HTML.png\" alt=\"figure 8\" loading=\"lazy\" width=\"685\" height=\"225\"\/><\/a><\/p>\n<p>A sensitivity analysis is shown, performed by perturbing two key exposure inputs\u2014ember deposition and flame length\u2014using 100 ensemble outputs from the ELMFIRE spread model (HAMADA extension) for each fire, while holding all other predictors constant. For each test sample (n\u2009=\u200947,742), the model\u2019s predicted survival probability was computed across ensembles to yield a mean prediction (solid fill) and its uncertainty (shaded region\u2009=\u2009\u00b1 1 standard deviation). a Ember perturbation: blue fill shows the kernel density of mean predictions under varied ember load; dashed blue lines mark mean\u2009\u00b1\u2009std. Inset reports overall mean, standard deviation and relative uncertainty (%). b Flame perturbation: green fill shows the kernel density of predictions under varied flame length; dashed green lines mark mean\u2009\u00b1\u2009std. Inset reports overall mean, standard deviation and relative uncertainty (%). This framework quantifies the influence of non-linear interactions and input variability on the binary damage classification (0\u2009=\u2009not damaged, 1\u2009=\u2009damaged), reveals emergent intermediate prediction modes, and offers both local and global insights into model behavior under uncertainty.<\/p>\n<p>Confusion matrix and ROC curve for predictions<\/p>\n<p>A confusion matrix summarizes the classification performance of a classifier with respect to some test data. It is a two-dimensional matrix, indexed in one dimension by the true class of an object and in the other by the class that the classifier assigns<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 61\" title=\"Ting, K. M. Confusion matrix. In Encyclopedia of Machine Learning (eds Sammut, C. &amp; Webb, G. I.) 209&#x2013;209 (Springer, 2011).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR61\" id=\"ref-link-section-d643601119e2239\" rel=\"nofollow noopener\" target=\"_blank\">61<\/a>. Receiver operating characteristics (ROC) graphs are useful for organizing classifiers and visualizing their performance. A receiver operating characteristics (ROC) graph is a technique for visualizing, organizing and selecting classifiers based on their performance<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 62\" title=\"Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861&#x2013;874 (2006).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#ref-CR62\" id=\"ref-link-section-d643601119e2243\" rel=\"nofollow noopener\" target=\"_blank\">62<\/a>. We investigated the five large WUI fires in our dataset to predict structure survival during each fire by understanding the model\u2019s accuracy, and other key performance metrics. By analyzing the confusion matrices and ROC curves for each fire event, we were able to identify patterns and discrepancies in model performance, leading to a better understanding of the factors influencing structure survival in large WUI fires.<\/p>\n<p>Reporting summary<\/p>\n<p>Further information on research design is available in the\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-63386-2#MOESM2\" rel=\"nofollow noopener\" target=\"_blank\">Nature Portfolio Reporting Summary<\/a> linked to this article.<\/p>\n","protected":false},"excerpt":{"rendered":"We primarily relied on a modified database from five selected fires that includes more than 47,000 structures with&hellip;\n","protected":false},"author":2,"featured_media":3583,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[7,9,8,4378,4381,4379,4382,4380,645],"class_list":{"0":"post-3582","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-california","8":"tag-california","9":"tag-california-headlines","10":"tag-california-news","11":"tag-geography","12":"tag-humanities-and-social-sciences","13":"tag-mechanical-engineering","14":"tag-multidisciplinary","15":"tag-natural-hazards","16":"tag-science"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/us-ca\/wp-json\/wp\/v2\/posts\/3582","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/us-ca\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/us-ca\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us-ca\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us-ca\/wp-json\/wp\/v2\/comments?post=3582"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/us-ca\/wp-json\/wp\/v2\/posts\/3582\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/us-ca\/wp-json\/wp\/v2\/media\/3583"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/us-ca\/wp-json\/wp\/v2\/media?parent=3582"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us-ca\/wp-json\/wp\/v2\/categories?post=3582"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/us-ca\/wp-json\/wp\/v2\/tags?post=3582"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}