Welcome to visit Scientia Silvae Sinicae,Today is

Scientia Silvae Sinicae ›› 2021, Vol. 57 ›› Issue (10): 36-48.doi: 10.11707/j.1001-7488.20211004

Previous Articles     Next Articles

Estimation of Forest Aboveground Biomass in Northwest Hunan Province Based on Machine Learning and Multi-Source Data

Jiaqi Ding1,3,Wenli Huang1,2,*,Yingchun Liu4,Yang Hu5   

  1. 1. School of Resource and Environmental Sciences, Wuhan University Wuhan 430079
    2. Key Laboratory of Urban Land Resource Monitoring and Simulation of Ministry of Natural Resources Shenzhen 518034
    3. College of Urban and Environmental Sciences, Peking University Beijing 100871
    4. Academy of Forest Inventory and Planning, National Forestry and Grassland Administration Beijing 100714
    5. Key Laboratory for Restoration and Reconstruction of Degraded Ecosystem in Northwest China of Ministry of Education Yinchuan 750021
  • Received:2020-09-18 Online:2021-10-25 Published:2021-12-11
  • Contact: Wenli Huang

Abstract:

Objective: Aiming at the problems of high cost, low timeliness and poor uniformity of the results of traditional forest inventory method, based on multi-source remote sensing data, machine learning method was used to select characteristic variables and establish an estimation model to make the map products of aboveground biomass(AGB) in the study area in order to provide technical means for forest resource information survey. Method: Taking the northwest of Hunan Province as the study area, the AGB reference values of 393 sample plots were selected by using allometric growth equations to convert the survey data into AGB. The Landsat-8 data were used as the optical remote sensing data source to extract spectral information, vegetation index, texture feature and the components of tasseled cap transformation. ALOS PALSAR-2 and Sentinel-1 data were used as radar remote sensing data sources to extract backscatter intensity and normalized polarization difference index for each polarization mode. A total of 122 candidate feature variables were obtained including topographical variables(elevation, slope and aspect). Multivariate linear regression(MLR), random forest(RF) and support vector regression(SVR) models were established after selecting the modeling variables by stepwise regression and random forest method. Using the coefficient of determination(R2) and root mean square error(RMSE) as model evaluation index, the models were evaluated by ten-fold cross-validation method, the best model was selected to complete biomass mapping, and five biomass map products at China or global scale were selected for comparative analysis. Result: For the training set, the random forest model performed the best with RMSE=12.8 mg·hm-2, rRMSE=21.1%, and R2=0.93, which fitted the data well, followed by the support vector regression model (RMSE=26.1 mg·hm-2, rRMSE=43.3%, R2=0.55) and the multivariate linear regression model(RMSE=30.9 mg·hm-2, rRMSE=50.5%, R2=0.39). On the test set, the model performance achieved by RF method (RMSE=30.1 mg·hm-2, rRMSE=51.3%, R2=0.42) was also better than that of MLR(RMSE=32.6 mg·hm-2, rRMSE=54.1%, R2=0.30) and SVR(RMSE=32.8 mg·hm-2, rRMSE=55.3%, R2=0.25). At the same time, all three models showed a certain degree of underestimation over small AGB and overestimation over large AGB. The RF model selected 13 modeling variables, including PALSAR-2 backscatter information, elevation and Landsat-8 spectral information, vegetation index, and the difference between tasseled cap transformation humidity and greenness components. Regional biomass mapping was completed using the RF model. Compared with other products, this model could basically reflect the distribution of biomass in the study area and showed abundant detailed information on biomass distribution. The range of biomass in the study area was 0-119 mg·hm-2, the average biomass was 37.5 mg·hm-2 and the standard deviation was 35.9 mg·hm-2. Conclusion: Combined with multi-source remote sensing data and machine learning algorithm, large-scale biomass could be accurately and quickly calculated, which would be great application potentials. Compared with SVR model and MLR model, RF model performed better in AGB estimation for the study area. The RF algorithm could effectively select variables for AGB machine learning modeling from multi-source variables.

Key words: forest aboveground biomass(AGB), multiple linear regression(MLR), random forest(RF), support vector regression(SVR)

CLC Number: