Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization
Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization
-
摘要: Accurate assessment of undrained shear strength (USS) for soft sensitive clays is a great concern in geotechnical engineering practice. This study applies novel data-driven extreme gradient boosting (XGBoost) and random forest (RF) ensemble learning methods for capturing the relationships between the USS and various basic soil parameters. Based on the soil data sets from TC304 database, a general approach is developed to predict the USS of soft clays using the two machine learning methods above, where five feature variables including the preconsolidation stress (PS), vertical effective stress (VES), liquid limit (LL), plastic limit (PL) and natural water content (W) are adopted. To reduce the dependence on the rule of thumb and inefficient brute-force search, the Bayesian optimization method is applied to determine the appropriate model hyper-parameters of both XGBoost and RF. The developed models are comprehensively compared with three comparison machine learning methods and two transformation models with respect to predictive accuracy and robustness under 5-fold cross-validation (CV). It is shown that XGBoost-based and RF-based methods outperform these approaches. Besides, the XGBoostbased model provides feature importance ranks, which makes it a promising tool in the prediction of geotechnical parameters and enhances the interpretability of model.Abstract: Accurate assessment of undrained shear strength (USS) for soft sensitive clays is a great concern in geotechnical engineering practice. This study applies novel data-driven extreme gradient boosting (XGBoost) and random forest (RF) ensemble learning methods for capturing the relationships between the USS and various basic soil parameters. Based on the soil data sets from TC304 database, a general approach is developed to predict the USS of soft clays using the two machine learning methods above, where five feature variables including the preconsolidation stress (PS), vertical effective stress (VES), liquid limit (LL), plastic limit (PL) and natural water content (W) are adopted. To reduce the dependence on the rule of thumb and inefficient brute-force search, the Bayesian optimization method is applied to determine the appropriate model hyper-parameters of both XGBoost and RF. The developed models are comprehensively compared with three comparison machine learning methods and two transformation models with respect to predictive accuracy and robustness under 5-fold cross-validation (CV). It is shown that XGBoost-based and RF-based methods outperform these approaches. Besides, the XGBoostbased model provides feature importance ranks, which makes it a promising tool in the prediction of geotechnical parameters and enhances the interpretability of model.