欢迎访问林业科学,今天是

林业科学 ›› 2021, Vol. 57 ›› Issue (10): 36-48.doi: 10.11707/j.1001-7488.20211004

• 论文与研究报告 • 上一篇    下一篇

基于机器学习和多源数据的湘西北森林地上生物量估测

丁家祺1,3,黄文丽1,2,*,刘迎春4,胡杨5   

  1. 1. 武汉大学资源与环境科学学院 武汉 430079
    2. 自然资源部城市国土资源监测与仿真重点实验室 深圳 518034
    3. 北京大学城市与环境学院 北京 100871
    4. 国家林业和草原局调查规划设计院 北京 100714
    5. 西北退化生态系统恢复与重建教育部重点实验室 银川 750021
  • 收稿日期:2020-09-18 出版日期:2021-10-25 发布日期:2021-12-11
  • 通讯作者: 黄文丽
  • 基金资助:
    国家自然科学基金项目(41901351);自然资源部城市国土资源监测与仿真重点实验室开放基金项目(KF-2020-05-0076);国家林业和草原局赤子计划项目(2018)

Estimation of Forest Aboveground Biomass in Northwest Hunan Province Based on Machine Learning and Multi-Source Data

Jiaqi Ding1,3,Wenli Huang1,2,*,Yingchun Liu4,Yang Hu5   

  1. 1. School of Resource and Environmental Sciences, Wuhan University Wuhan 430079
    2. Key Laboratory of Urban Land Resource Monitoring and Simulation of Ministry of Natural Resources Shenzhen 518034
    3. College of Urban and Environmental Sciences, Peking University Beijing 100871
    4. Academy of Forest Inventory and Planning, National Forestry and Grassland Administration Beijing 100714
    5. Key Laboratory for Restoration and Reconstruction of Degraded Ecosystem in Northwest China of Ministry of Education Yinchuan 750021
  • Received:2020-09-18 Online:2021-10-25 Published:2021-12-11
  • Contact: Wenli Huang

摘要:

目的: 针对传统森林资源清查方法成本高、时效性低和结果统一性差等问题,基于多源遥感数据,采用机器学习算法选择特征变量并建立估测模型,制作森林地上生物量(AGB)面分布产品,为森林资源信息化调查提供技术手段。方法: 以湖南省西北部为研究区,利用生物量异速生长方程将森林资源样地调查数据转换为AGB,筛选到393个样地AGB参考值。以Landsat-8数据为光学遥感数据源,提取各波段光谱信息、植被指数、纹理特征以及缨帽变换各分量;以ALOS PALSAR-2、Sentinel-1数据为雷达遥感数据源,提取各极化方式后向散射强度和归一化极化差分指数。结合高程、坡度、坡向地形因子,获得122个备选特征变量。采用逐步回归方法和随机森林(RF)算法筛选建模变量,分别建立多元线性回归(MLR)、RF和支持向量回归(SVR)模型。以均方根误差(RMSE)、相对均方根误差(rRMSE)和决定系数(R2)为模型评价指标,运用十折交叉验证法评价模型效果,选择最佳模型完成生物量制图,并选取5种中国或全球尺度生物量制图产品进行比较分析。结果: 在训练集上,RF模型表现最好(RMSE=12.8 mg·hm-2,rRMSE=21.1%,R2=0.93),其次为SVR模型(RMSE=26.1 mg·hm-2,rRMSE=43.3%,R2=0.55),MLR模型表现最差(RMSE=30.9 mg·hm-2,rRMSE=50.5%,R2=0.39);在测试集上,采用RF算法建立的模型表现(RMSE=30.1 mg·hm-2,rRMSE=51.3%,R2=0.42)同样优于MLR(RMSE=32.6 mg·hm-2,rRMSE=54.1%,R2=0.30)和SVR(RMSE=32.8 mg·hm-2,rRMSE=55.3%,R2=0.25)。3种模型均显示出一定程度的低值高估和高值低估现象。RF模型选择出13个建模变量,包括PALSAR-2后向散射信息、高程以及Landsat-8光谱信息、植被指数、缨帽变换湿度与绿度分量差值。应用RF模型完成区域生物量制图,与其他产品对比,能够基本反映研究区内生物量分布情况,并显示出丰富的生物量分布细节信息,生物量范围为0~119 mg·hm-2,平均生物量为37.5 mg·hm-2,标准差为35.9 mg·hm-2结论: 结合多源遥感数据与机器学习算法,能够准确、快速地测算大范围生物量,具有较大应用潜力。相比SVR和MLR模型,RF模型在AGB估测上的表现更优,RF算法能够从多源变量中有效筛选出适用于AGB机器学习建模的变量。

关键词: 森林地上生物量, 多元线性回归, 随机森林, 支持向量回归

Abstract:

Objective: Aiming at the problems of high cost, low timeliness and poor uniformity of the results of traditional forest inventory method, based on multi-source remote sensing data, machine learning method was used to select characteristic variables and establish an estimation model to make the map products of aboveground biomass(AGB) in the study area in order to provide technical means for forest resource information survey. Method: Taking the northwest of Hunan Province as the study area, the AGB reference values of 393 sample plots were selected by using allometric growth equations to convert the survey data into AGB. The Landsat-8 data were used as the optical remote sensing data source to extract spectral information, vegetation index, texture feature and the components of tasseled cap transformation. ALOS PALSAR-2 and Sentinel-1 data were used as radar remote sensing data sources to extract backscatter intensity and normalized polarization difference index for each polarization mode. A total of 122 candidate feature variables were obtained including topographical variables(elevation, slope and aspect). Multivariate linear regression(MLR), random forest(RF) and support vector regression(SVR) models were established after selecting the modeling variables by stepwise regression and random forest method. Using the coefficient of determination(R2) and root mean square error(RMSE) as model evaluation index, the models were evaluated by ten-fold cross-validation method, the best model was selected to complete biomass mapping, and five biomass map products at China or global scale were selected for comparative analysis. Result: For the training set, the random forest model performed the best with RMSE=12.8 mg·hm-2, rRMSE=21.1%, and R2=0.93, which fitted the data well, followed by the support vector regression model (RMSE=26.1 mg·hm-2, rRMSE=43.3%, R2=0.55) and the multivariate linear regression model(RMSE=30.9 mg·hm-2, rRMSE=50.5%, R2=0.39). On the test set, the model performance achieved by RF method (RMSE=30.1 mg·hm-2, rRMSE=51.3%, R2=0.42) was also better than that of MLR(RMSE=32.6 mg·hm-2, rRMSE=54.1%, R2=0.30) and SVR(RMSE=32.8 mg·hm-2, rRMSE=55.3%, R2=0.25). At the same time, all three models showed a certain degree of underestimation over small AGB and overestimation over large AGB. The RF model selected 13 modeling variables, including PALSAR-2 backscatter information, elevation and Landsat-8 spectral information, vegetation index, and the difference between tasseled cap transformation humidity and greenness components. Regional biomass mapping was completed using the RF model. Compared with other products, this model could basically reflect the distribution of biomass in the study area and showed abundant detailed information on biomass distribution. The range of biomass in the study area was 0-119 mg·hm-2, the average biomass was 37.5 mg·hm-2 and the standard deviation was 35.9 mg·hm-2. Conclusion: Combined with multi-source remote sensing data and machine learning algorithm, large-scale biomass could be accurately and quickly calculated, which would be great application potentials. Compared with SVR model and MLR model, RF model performed better in AGB estimation for the study area. The RF algorithm could effectively select variables for AGB machine learning modeling from multi-source variables.

Key words: forest aboveground biomass(AGB), multiple linear regression(MLR), random forest(RF), support vector regression(SVR)

中图分类号: