Volume 12 Issue 1
Dec.  2020
Turn off MathJax
Article Contents
Wengang Zhang, Chongzhi Wu, Haiyi Zhong, Yongqin Li, Lin Wang. Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization[J]. Geoscience Frontiers, 2021, 12(1): 469-477. doi: 10.1016/j.gsf.2020.03.007
Citation: Wengang Zhang, Chongzhi Wu, Haiyi Zhong, Yongqin Li, Lin Wang. Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization[J]. Geoscience Frontiers, 2021, 12(1): 469-477. doi: 10.1016/j.gsf.2020.03.007

Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization

doi: 10.1016/j.gsf.2020.03.007

The first author is grateful for the financial support from High-end Foreign Expert Introduction program (No. G20190022002), Chongqing Construction Science and Technology Plan Project (2019–0045) as well as Chongqing Engineering Research Center of Disaster Prevention & Control for Banks and Structures in Three Gorges Reservoir Area (Nos. SXAPGC18ZD01 and SXAPGC18YB03).

  • Received Date: 2019-10-08
  • Rev Recd Date: 2019-12-29
  • Accurate assessment of undrained shear strength (USS) for soft sensitive clays is a great concern in geotechnical engineering practice. This study applies novel data-driven extreme gradient boosting (XGBoost) and random forest (RF) ensemble learning methods for capturing the relationships between the USS and various basic soil parameters. Based on the soil data sets from TC304 database, a general approach is developed to predict the USS of soft clays using the two machine learning methods above, where five feature variables including the preconsolidation stress (PS), vertical effective stress (VES), liquid limit (LL), plastic limit (PL) and natural water content (W) are adopted. To reduce the dependence on the rule of thumb and inefficient brute-force search, the Bayesian optimization method is applied to determine the appropriate model hyper-parameters of both XGBoost and RF. The developed models are comprehensively compared with three comparison machine learning methods and two transformation models with respect to predictive accuracy and robustness under 5-fold cross-validation (CV). It is shown that XGBoost-based and RF-based methods outperform these approaches. Besides, the XGBoostbased model provides feature importance ranks, which makes it a promising tool in the prediction of geotechnical parameters and enhances the interpretability of model.

  • loading
  • [1]
    Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B., 2011. Algorithms for hyper-parameter optimization. In:Shawe-Taylor, J, Zemel, R.S., Bartlett, P.L. (Eds.), Proceedings of the 24th International Conference on Neural Information Processing Systems. Curran Associates Inc., New York, pp. 2546-2554.
    Bergstra, J., Yamins, D., Cox, D.D., 2013. Hyperopt:a python library for optimizing the hyper-parameters of machine learning algorithms. In:Proceedings of the 12th Python in Science Conference, pp. 13-20.
    Breiman, L., 2001. Random forests. Mach. Learn. 45 (1), 5-32.
    Cao, Z., Wang, Y., 2014. Bayesian model comparison and characterization of undrained shear strength. J. Geotech. Geoenviron. Eng. 140 (6), 04014018. https://doi.org/10.1061/(ASCE)GT.1943-5606.0001108.
    Chen, T., Guestrin, C., 2016. Xgboost:a scalable tree boosting system. In:Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining. ACM, pp. 785-794.
    Ching, J., Phoon, K.K., 2012. Modeling parameters of structured clays as a multivariate normal distribution. Can. Geotech. J. 49 (5), 522-545. https://doi.org/10.1139/t2012-015.
    Ching, J., Phoon, K.K., Chen, C.H., 2013. Modeling piezocone cone penetration (CPTU) parameters of clays as a multivariate normal distribution. Can. Geotech. J. 51 (1), 77-91. https://doi.org/10.1139/cgj-2012-0259.
    Ching, J., Phoon, K.K., 2014. Transformations and correlations among some clay parameters-the global database. Can. Geotech. J. 51 (6), 663-685. https://doi.org/10.1139/cgj-2013-0262.
    Ching, J., Phoon, K.K., 2018. Constructing site-specific multivariate probability distribution model using bayesian machine learning. J. Eng. Mech. 145 (1), 04018126. https://doi.org/10.1061/(ASCE)EM.1943-7889.0001537.
    Cortes, C., Vapnik, V., 1995. Support-vector networks. Mach. Learn. 20 (3), 273-297.
    De Myttenaere, A., Golden, B., Le Grand, B., Rossi, F., 2016. Mean absolute percentage error for regression models. Neurocomputing 192, 38-48. https://doi.org/10.1016/j.neucom.2015.12.114.
    D'Ignazio, M., Phoon, K.K., Tan, S.A., Länsivaara, T.T., 2016. Correlations for undrained shear strength of Finnish soft clays. Can. Geotech. J. 53 (10), 1628-1645. https://doi.org/10.1139/cgj-2016-0037.
    Friedman, J.H., 1991. Multivariate adaptive regression splines. Ann. Stat. 19 (1), 1-67.
    Friedman, J.H., 2001. Greedy function approximation:a gradient boosting machine. Ann. Stat. 1189-1232.
    Gardner, M.W., Dorling, S.R., 1998. Artificial neural networks (the multilayer perceptron)-a review of applications in the atmospheric sciences. Atmos. Environ. 32 (14-15), 2627-2636.
    Ghahramani, Z., 2015. Probabilistic machine learning and artificial intelligence. Nature 521 (7553), 452.
    Goh, A.T.C., 1995. Empirical design in geotechnics using neural networks. Geotechnique 45 (4), 709-714.
    Hansbo, S., 1957. A new approach to determination of shear strength of clay by the fall cone test. In:Swedish Geotech Institute Proc, vol. 14. Stockholm, 1-48.
    Hastie, T., Tibshirani, R., Friedman, J., Franklin, J., 2005. The elements of statistical learning:data mining, inference and prediction. Math. Intel. 27 (2), 83-85. https://doi.org/10.1007/BF02985802.
    Hoffman, M.W., Shahriari, B., 2014. Modular mechanisms for bayesian optimization. In:NIPS Workshop on Bayesian Optimization, pp. 1-5.
    Ho, T.K., 1995. Random decision forests. In:Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1. IEEE, pp. 278-282.
    Hutter, F., Hoos, H.H., Leyton-Brown, K., 2011. Sequential model-based optimization for general algorithm configuration. In:International Conference on Learning and Intelligent Optimization. Springer, Berlin, Heidelberg, pp. 507-523.
    Jamiolkowski, M., 1985. New developments in field and laboratory testing of soils. In:Proceedings of the 11th International Conference On Soil Mechanics and Foundation Engineering, vol. 1. San Francisco, pp. 57-153.
    Koduru, S., 2019. A Bayesian Network for Slope Geohazard Management of Buried Energy Pipelines. In:3th International Conference on Applications of Statistics and Probability in Civil Engineering(ICASP13), Seoul. South Korea, pp. 1-9. https://doi.org/10.22725/ICASP13.444.
    Kohavi, R., 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. International Joint Conference of Artificial Intelligence 14 (2), 1137-1145.
    Kulhawy, F.H., Mayne, P.W., 1990. Manual on Estimating Soil Properties for Foundation Design (No. EPRI-EL-6800). Electric Power Research Inst., Palo Alto, CA (USA); Cornell Univ., Ithaca, NY (USA) (Geotechnical Engineering Group).
    Lessmann, S., Baesens, B., Seow, H.V., Thomas, L.C., 2015. Comparisoning state-of-the-art classification algorithms for credit scoring:an update of research. Eur. J. Oper. Res. 247 (1), 124-136. https://doi.org/10.1016/j.ejor.2015.05.030.
    Li, X., Zhang, L., Zhang, S., 2018. Efficient Bayesian networks for slope safety evaluation with large quantity monitoring information. Geosci. Front. 9 (6), 1679-1687. https://doi.org/10.1016/j.gsf.2017.09.009.
    Liu, L., Zhang, S., Cheng, Y.M., Liang, L., 2019. Advanced reliability analysis of slopes in spatially variable soils using multivariate adaptive regression splines. Geosci. Front. 10 (2), 671-682. https://doi.org/10.1016/j.gsf.2018.03.013.
    Mesri, G., 1975. New design procedure for stability of soft clays. J. Geotech. Geoenviron. Eng. 101 (4), 409-412 (Discussion).
    Nanni, L., Lumini, A., 2009. An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring. Expert Syst. Appl. 36 (2), 3028-3033. https://doi.org/10.1016/j.eswa.2008.01.018.
    Nascimento, D.S., Coelho, A.L., Canuto, A.M., 2014. Integrating complementary techniques for promoting diversity in classifier ensembles:a systematic study. Neurocomputing 138, 347-357. https://doi.org/10.1016/j.neucom.2014.01.027.
    Phoon, K.K., Kulhawy, F.H., 1999. Characterization of geotechnical variability. Can. Geotech. J. 36 (4), 612-624.
    Rodriguez, J.D., Perez, A., Lozano, J.A., 2009. Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Trans. Pattern Anal. Mach. Intell. 32(3), 569-575. https://doi.org/10.1109/TPAMI.2009.187.
    Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M., Chica-Rivas, M., 2015. Machine learning predictive models for mineral prospectivity:an evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol. Rev. 71, 804-818. https://doi.org/10.1016/j.oregeorev.2015.01.001.
    Snoek, J., Larochelle, H., Adams, R.P., 2012. Practical Bayesian optimization of machine learning algorithms. In:Shawe-Taylor, J, Zemel, R.S., Bartlett, P.L. (Eds.), Proceedings of the 24th International Conference on Neural Information Processing Systems. Curran Associates Inc., New York, United States, pp. 2951-2959.
    Teh, C.I., Wong, K.S., Goh, A.T.C., Jaritngam, S., 1997. Prediction of pile capacity using neural networks. J. Comput. Civ. Eng. 11 (2), 129-138.
    Wang, L., Cao, Z.J., Li, D.Q., Phoon, K.K., Au, S.K., 2018. Determination of site-specific soil-water characteristic curve from a limited number of test data-a Bayesian perspective. Geosci. Front. 9 (6), 1665-1677. https://doi.org/10.1016/j.gsf.2017.10.014.
    Wang, L., Zhang, W.G., Chen, F.Y., 2019. Bayesian approach for predicting soil-water characteristic curve from particle-size distribution data. Energies 12 (15), 2992.https://doi.org/10.3390/en12152992.
    Wang, Y., Cao, Z., 2013. Probabilistic characterization of Young's modulus of soil using equivalent samples. Eng. Geol. 159, 106-118. https://doi.org/10.1016/j.enggeo.2013.03.017.
    Wang, Y., Aladejare, A.E., 2016. Bayesian characterization of correlation between uniaxial compressive strength and Young's modulus of rock. Int. J. Rock Mech. Min. Sci. 85, 10-19. https://doi.org/10.1016/j.ijrmms.2016.02.010.
    Wong, T.T., 2015. Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recogn. 48 (9), 2839-2846. https://doi.org/10.1016/j.patcog.2015.03.009.
    Xia, Y., Liu, C., Li, Y., Liu, N., 2017. A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Syst. Appl. 78, 225-241. https://doi.org/10.1016/j.eswa.2017.02.017.
    Yousefpour, N., Medina-Cetina, Z., Jahedkar, K., Delphia, J., Briaud, J.L., Hurlebaus, S., Tucker, S., Everett, M., Arjwech, R., 2011. Determination of unknown foundation of bridges for scour evaluation using artificial neural networks. In:American Society of Civil Engineers, Geo-Frontiers 2011:Advances in Geotechnical Engineering, Dallas, Texas, United States, pp. 1514-1523. https://doi.org/10.1061/41165(397)155.
    Zhang, W.G., Li, Y.Q., Wu, C.Z., Li, H.R., Goh, A.T.C., Zhang, R.H., 2020a. Prediction of lining response for twin-tunnel constructed in anisotropic clay using machine learning techniques. Undergr. Space, in press. https://doi.org/10.1016/j.undsp.2020.02.007.
    Zhang, W.G., Li, H.R., Wu, C.Z., Li, Y.Q., Liu, Z.Q., Liu, H.L., 2020b. Soft computing approach for prediction of surface settlement induced by earth pressure balance shield tunneling. Undergr. Space, in press. https://doi.org/10.1016/j.undsp.2019.12.003.
    Zhang, R.H., Wu, C.Z., Goh, A.T.C., Thomas, B., Zhang, W.G., 2020c. Estimation of diaphragm wall deflections for deep braced excavation in anisotropic clays using ensemble learning. Geosci. Front, in press. https://doi.org/10.1016/j.gsf.2020.03.003.
    Zhang, L., Wu, X., Ji, W., AbouRizk, S.M., 2016. Intelligent approach to estimation of tunnel-induced ground settlement using wavelet packet and support vector machines. J. Comput. Civ. Eng. 31 (2), 04016053. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000621.
    Zhang, W.G., Goh, A.T.C., 2013. Multivariate adaptive regression splines for analysis of geotechnical engineering systems. Comput. Geotech. 48, 82-95. https://doi.org/10.1016/j.compgeo.2012.09.016.
    Zhang, W., Goh, A.T., Zhang, Y., Chen, Y., Xiao, Y., 2015. Assessment of soil liquefaction based on capacity energy concept and multivariate adaptive regression splines. Eng. Geol. 188, 29-37. https://doi.org/10.1016/j.enggeo.2015.01.009.
    Zhang, W., Goh, A.T., 2016. Multivariate adaptive regression splines and neural network models for prediction of pile drivability. Geosci. Front. 7 (1), 45-52. https://doi.org/10.1016/j.gsf.2014.10.003.
    Zhang, W., Wu, C., Li, Y., Wang, L., Samui, P., 2019. Assessment of pile drivability using random forest regression and multivariate adaptive regression splines. Georisk 1-14.https://doi.org/10.1080/17499518.2019.1674340.
    Zhang, W., Zhang, R., Wu, C., Goh, A.T.C., Lacasse, S., Liu, Z., Liu, H., 2020d. State-of-theart review of soft computing applications in underground excavations. Geosci. Front. 11 (4), 1095-1106. https://doi.org/10.1016/j.gsf.2019.12.003.
    Zhang, W.G., Zhang, R.H., Wu, C.Z., Goh, A.T.C., Wang, L., 2020e. Assessment of basal heave stability for braced excavations in anisotropic clay using extreme gradient boosting and random forest regression. Undergr. Space, in press. https://doi.org/10.1016/j.undsp.2020.03.001.
    Zhou, J., Li, X., Mitri, H.S., 2016a. Classification of rockburst in underground projects:comparison of ten supervised learning methods. J. Comput. Civ. Eng. 30 (5), 04016003. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000553.
    Zhou, J., Li, E., Wei, H., Li, C., Qiao, Q., Armaghani, D.J., 2019. Random forests and cubist algorithms for predicting shear strengths of rockfill materials. Appl. Sci. 9 (8), 1621.https://doi.org/10.3390/app9081621.
    Zhou, J., Shi, X., Du, K., Qiu, X., Li, X., Mitri, H.S., 2016b. Feasibility of random-forest approach for prediction of ground settlements induced by the construction of a shield-driven tunnel. Int. J. GeoMech. 17 (6), 04016129. https://doi.org/10.1061/(ASCE)GM.1943-5622.0000817.
  • 加载中


    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (387) PDF downloads(37) Cited by()
    Proportional views


    DownLoad:  Full-Size Img  PowerPoint