Abstract:
Background Debris flows are abrupt meteorological–geological hazards that predominantly develop in mountainous regions. Due to their rapid onset, high mobility, and strong destructive power, debris flows often result in severe casualties, extensive property losses, and significant damage to infrastructure and the ecological environment, thereby attracting widespread attention from both researchers and disaster management authorities. Debris flow hazard assessment constitutes the fundamental basis for debris flow disaster forecasting and prediction, and it provides essential scientific support for early warning systems, disaster prevention, and mitigation measures aimed at reducing potential risks and losses. In debris flow hazard assessment, the classification intervals of continuous evaluation factors and machine learning models have a significant impact on the hazard assessment results.Methods In this study, sub-basins were adopted as evaluation units, and the GeoDetector method was used to optimize the classification intervals of debris flow hazard evaluation factors by establishing their relationships with actual debris flow occurrence points. The optimized intervals were first verified using the Random Forest model, and then incorporated into the Random Forest (RF), Support Vector Machine (SVM), and Extreme Gradient Boosting (XG-Boost) models. Bayesian optimization was applied for hyperparameter tuning, and the built-in SHAP algorithm was used for feature importance analysis to reveal factor contributions to debris flow risk and identify the dominant disaster-causing factors, thereby enabling debris flow hazard assessment in the Panzhihua-xichang(Panxi) region.Results 1) the Random Forest model using classification intervals optimized by the GeoDetector method achieved the highest accuracy, with an AUC value of 0.935. This performance is superior to that obtained using the conventional natural breaks method, indicating that GeoDetector-based classification optimization significantly improves the accuracy of debris flow hazard assessment. 2) All three machine learning models exhibited good predictive performance and showed high consistency in identifying the main disaster-causing factors of debris flows in the Panxi region. Owing to its strong nonlinear modeling capability, the Random Forest model outperformed the Extreme Gradient Boosting and Support Vector Machine models in handling complex multi-factor interactions. Specifically, the Random Forest model achieved an AUC of 0.935 and an accuracy of 0.854, which are higher than those of the Extreme Gradient Boosting model (AUC = 0.926, accuracy = 0.842) and the Support Vector Machine model (AUC = 0.921, accuracy = 0.829), demonstrating its overall superior performance. The hazard assessment results further indicate that very high and high hazard zones are mainly concentrated in the central and eastern parts of the Panxi region, whereas very low hazard zones are primarily distributed in the northwestern area. These findings provide valuable scientific support for debris flow disaster prevention and mitigation efforts in the Panxi region. Conclusions Based on the GeoDetector, the classification intervals of continuous evaluation factors can be effectively optimized. In addition, among the machine learning models applied in this study, the Random Forest model exhibits the highest predictive accuracy in debris flow hazard assessment in the Panxi region. The hazard assessment results provide reliable and scientifically sound information for identifying high-risk areas, supporting the formulation of targeted debris flow prevention and mitigation strategies, and optimizing the layout of disaster prevention and control measures. Consequently, these findings are of great practical significance for enhancing debris flow disaster management and protection efforts in the Panxi region.