欢迎访问林业科学,今天是

林业科学 ›› 2020, Vol. 56 ›› Issue (2): 134-141.doi: 10.11707/j.1001-7488.20200215

• 论文与研究报告 • 上一篇    下一篇

基于LSTM的活立木茎干水分缺失数据填补方法

宋维1,3,高超2,赵玥1,3,赵燕东1,3,*   

  1. 1. 北京林业大学工学院 北京 100083
    2. 北京工商大学计算机与信息工程学院 北京 100048
    3. 北京林业大学城乡生态环境北京实验室 北京 100083
  • 收稿日期:2019-01-14 出版日期:2020-02-25 发布日期:2020-03-17
  • 通讯作者: 赵燕东
  • 基金资助:
    国家重点研发计划项目(2017YFD0600901);北京市科技计划课题(Z161100000916012);北京市共建项目专项

Method of Filling the Missing Water Loss Data of Living Plant Stem by Sequence Based on LSTM

Wei Song1,3,Chao Gao2,Yue Zhao1,3,Yandong Zhao1,3,*   

  1. 1. School of Technology, Beijing Forestry University Beijing 100083
    2. School of Computer and Information Engineering, Beijing Technology and Business University Beijing 100048
    3. Beijing Laboratory of Urban and Rural Ecological Environment, Beijing Forestry University Beijing 100083
  • Received:2019-01-14 Online:2020-02-25 Published:2020-03-17
  • Contact: Yandong Zhao

摘要:

目的: 研究植物茎体水分数据,针对相同数据段上的缺失数据,对比不同数据填补方法,验证LSTM模型填补茎干水分数据的有效性及准确性。方法: 选取2017年6月份栽种在北京市海淀区的紫薇树茎体水分完整数据,人工删去部分数据作为缺失数据,分别使用插值方法、RNN神经网络、LSTM神经网络对缺失部分进行填补,填补结果与原始数据比对并分析结果。基于神经网络预测值误差随预测时刻推后而增大的误差分布情况,本文提出了在神经网络预测值基础上加入对数据后期处理的方法:从缺失数据的正向和反向进行预测,将2个方向的预测值各自乘以一组按照预测时刻递减的权重值并相加,结合2个预测方向的优势,进一步提高预测准确度。结果: 3种方法中,RNN与LSTM神经网络方法较传统的插值方法优势明显:插值方法准确度在缺失值增多时迅速下降;神经网络方法下降速度较慢。当填补值与真实值误差在2%以内作为准确时,插值方法的填补准确率不足50%,RNN方法达到50%且不足60%,LSTM方法达到80%以上;当填补值与真实值误差在4%以内作为准确时,插值方法填补准确率为60%,RNN方法准确度最高达到90%,LSTM方法准确率在95%以上。在此基础上加入权重处理,对LSTM预测结果处理后误差在2%以内准确率达到97%,误差在3%以内准确率达到100%。选取一组测试数据代入模型,预测结果比训练数据预测结果精度有所下降,但双向预测方式优势更加明显。结论: 采用基于LSTM模型的双向综合预测法,可显著减小长期预测中的累计误差对预测结果的影响,提升了预测数据的准确度。与其他两类数据填补方法相比,基于LSTM神经网络的数据填补方法在长期缺失的时间序列数据填补上有较大优势。

关键词: 缺失数据, 数据填补, 时间序列, LSTM神经网络, 茎体水分

Abstract:

Objective: With the advent of the era of big data, ecological data is emerging in large numbers, but there are data missing phenomena in the process of collection and transmission, resulting in incomplete data, which brings difficulties for subsequent analysis and application. In order to improve the integrity and accuracy of data, it is important to find a suitable data filling method. In this study, the stem moisture data of a plant was used as the object. For the missing data on the same data segment, the different data filling methods were compared to verify the validity and accuracy of the LSTM model to fill the stem moisture data. Method: The integral data of stems water of Lagerstroemia indica planted in Haidian District of Beijing in June 2017 were selected as experimental data, and some data were manually deleted as missing data. The missing parts were filled by interpolation method, RNN neural network and LSTM neural network respectively. The results were compared with the original data and analyzed. Based on the error distribution of neural network predictive value error which increases with the delay of the prediction time, this paper proposed a method of adding late data processing on the basis of neural network prediction value:Prediction was implemented from the forward and reverse two directions of missing data, and the predicted values were multiplied by a set of weight values decreasing according to the prediction time, and then added together. In combination of the advantages of the two prediction directions, the prediction accuracy could be further improved. Result: Among the three methods, the RNN and LSTM neural network methods had obvious advantages compared with the traditional interpolation methods. The accuracy of the interpolation method decreased rapidly when the missing value increases, while the neural network method decreased slowly. When the error between filled value and real value was set within 2% as the accurate, the filling accuracy of the interpolation method was less than 50%, the RNN method was between 50% to 60%, and the LSTM method reached 80% or more; When the error between filled value and real value was set within 4% as the accurate, the filling accuracy of the interpolation method was 60%, the highest accuracy of the RNN method reached to 90%, and the accuracy of the LSTM method was more than 95%. When the weight processing was added on this basis, ,the accuracy of the LSTM prediction result was 97% within 2%, and 100% within 3%. Conclusion: This paper innovatively adopts the bidirectional comprehensive prediction method based on LSTM model, which significantly reduces the influence of cumulative error in long-term prediction on prediction results and improves the accuracy of prediction data. Compared with the other two kinds of data filling methods, the data filling method based on LSTM Neural network has a greater advantage in the long-term missing time series data filling.

Key words: missing data, data filling, time series, LSTM neural network, stem moisture

中图分类号: