基于气象因子的随机森林算法在塔河地区林火预测中的应用

doi:10.11707/j.1001-7488.20160111

摘要/Abstract

摘要： [目的] 应用逻辑斯蒂回归模型和随机森林算法建立大兴安岭塔河地区林火发生的预测模型并对比模型预测精度,判断随机森林算法在该地区林火预测中的适应性,为该地区林火管理工作提供技术支持。[方法] 利用1974-2008年大兴安岭塔河地区森林火灾发生数据,分别运用二项逻辑斯蒂回归模型和随机森林算法,对塔河地区林火发生与气象因子之间的关系进行实证分析。为减少训练样本分布对试验结果的影响,将全样本数据随机分成60%的训练样本和40%的测试样本,并且进行5次重复,建立5个中间模型(样本组)。选择在5个中间模型中的3个及以上的显著变量(因子)对全样本数据进行分析并分别比较2种模型算法在5个中间模型和全样本模型中的预测准确率。此外,还设计了变量交互试验进一步验证相同变量下2种模型的预测精度。[结果] 日最小相对湿度、细小可燃物湿度码和干旱码3个因子在二项逻辑斯蒂回归模型和随机森林算法中均与林火发生呈显著相关。模型拟合的预测结果显示:在对5个中间模型的预测中,随机森林算法对训练样本(60%)和测试样本(40%)的预测准确率分别高于二项逻辑斯蒂回归模型8%和10%左右;在全样本模型的预测中,随机森林算法拟合的准确率为85.0%,而二项逻辑斯蒂回归模型拟合的准确率为76.2%,二者相差10%左右,与之前5个中间模型的预测结果一致;在变量交互试验中,随机森林算法拟合的准确率为86.0%,而二项逻辑斯蒂回归模型拟合的准确率为72.8%,随机森林算法的预测准确率提高了18.1%左右。[结论] 日最小相对湿度、细小可燃物湿度码和干旱码是影响林火发生的主要气象因子。在基于气象因子的塔河地区林火发生预测模型研究中,随机森林算法的预测准确率高于传统二项逻辑斯蒂回归模型10%左右,具有一定的预测优势和应用价值,可为大兴安岭塔河地区林火预测和决策提供参考。

关键词: 塔河地区, 林火发生, 气象因子, 随机森林算法, 逻辑斯蒂回归

Abstract: [Objective] In this study, two methods were applied to establish fire prediction model for Tahe, Daxing'an Mountains. Our objective is to identify the applicability of random forest algorithm to local forest fire prediction according to prediction accuracy comparison. This study would provide some technical support for local forest fire management. [Method] The fire data collected in Tahe, Daxing'an Mountains between 1974 and 2008 were used in a case study to identify the relationship between fire occurrence and meteorological factors by using logistic regression (LR) model and random forest (RF) algorithm, respectively. In order to reduce the influence of sample distribution on the model fitting, the original dataset was randomly divided into training (60%) and validation (40%) samples. The procedure was repeated five times applying a sampling with replacement method, thus obtaining five random sub-samples (sample groups) of the data, each with a training and validation dataset. The predictors that had been proved to be significant at ɑ=0.05 in at least three of five intermediate models were included in the final models. Besides, in the present study a "cross validation" test was to identify the accuracy of the two models. [Result] The results of model parameter estimation indicated that daily minimum relative humidity, fine fuel moisture content (FFMC) and drought code (DC) were identified as important predictors in both Logistic and Random Forest model. The result of model fitting revealed that the prediction accuracy of LR model in five intermediate models were 8% and 10% lower than that of RF,respectively, for the training and variation samples. However, the prediction accuracy of RF on the complete dataset was 15% higher than that of LR. In the Cross Validation test, the prediction accuracy of RF was 85.0%, higher than that of LR (76.2%) and the result agreed with that of five sample groups. [Conclusion] Our results revealed that the RF model was superior to LR model on the fire prediction in the study area, thus the RF model can be used in the fire prediction and provide important information for the local fire management and plan.

Key words: Tahe area, fire occurrence, meteorological factors, random forest algorithm, Logistic regression

中图分类号:

S762.2

梁慧玲, 林玉蕊, 杨光, 苏漳文, 王文辉, 郭福涛. 基于气象因子的随机森林算法在塔河地区林火预测中的应用[J]. 林业科学, 2016, 52(1): 89-98.

Liang Huiling, Lin Yurui, Yang Guang, Su Zhangwen, Wang Wenhui, Guo Futao. Application of Random Forest Algorithm on the Forest Fire Prediction in Tahe Area Based on Meteorological Factors[J]. Scientia Silvae Sinicae, 2016, 52(1): 89-98.

参考文献

邓欧,李亦秋,冯仲科,等. 2012.基于空间Logistic的黑龙江省林火风险模型与火险区划.农业工程学报,28 (8):200-205.
(Deng O, Li Y Q, Feng Z K, et al. 2012. Model and zoning of forest fire risk in Heilongjiang Province based on spatial Logistic. Transactions of the Chinese Society of Agricultural Engineering, 28(8):200-205.[in Chinese])
邸雪颖,李永福,孙建,等. 2011.黑龙江省大兴安岭地区塔河县森林火险天气指标动态.应用生态学报,22(5):1240-1246.
(Di X Y, Li Y F, Sun J, et al. 2011. Dynamics of forest fire weather indices in Tahe County of Great Xing'an Mountains region, Heilongjiang Province. Chinese Journal of Applied Ecology, 22(5):1240-1246.[in Chinese])
杜春英,李帅,刘丹,等. 2010.大兴安岭地区森林雷击火发生的时空分布.自然灾害学报,19 (3):72-76.
(Du C Y, Li S, Liu D, et al. 2010. Spatiotemporal distribution of lightning-caused forest fires in Daxing'anling area. Journal of Natural Disasters, 19(3):72-76.[in Chinese])
胡海清. 2005. 林火生态与管理. 北京:中国林业出版社,
(Hu H Q.2005. Forest ecology and management. Beijing:China Forestry Publishing House.[in Chinese])
巩亚楠,帕提麦·马秉成,朱登浩,等. 2014. 随机森林与Logistic回归在预约挂号失约影响因素预测中的应用. 现代预防医学,41(5):769-772.
(Gong Y N, PaTimai Ma B C, Zhu D H, et al. 2014. Application of random forests and logistic regression in prediction of influencing factors of missed appointment registration. Modern Preventive Medicine, 41(5):769-772.[in Chinese])
郭福涛,苏漳文,马祥庆,等. 2015. 大兴安岭塔河地区雷击火发生驱动因子综合分析. 生态学报,35 (19):6439-6488.
(Guo F T, Su Z W, Ma X Q, et al. 2015. Comprehensive analysis of driving factors for lightning-caused fire in Tahe, Daxing'an Mountain. Acta Ecologica Sinica, 35(19):6439-6488.[in Chinese])
郭福涛,胡海清,马志海,等. 2010. 不同模型对拟合大兴安岭林火发生与气象因素关系的适用性. 应用生态学报,21(1):159-164.
(Guo F T, Hu H Q, Ma Z H, et al. 2010. Applicability of different models in simulating the relationships between forest fire occurrence and weather factors in Daxing'an Mountains. Chinese Journal of Applied Ecology, 21(1):159-164.[in Chinese])
李欣海. 2013. 随机森林模型在分类与回归分析中的应用. 应用昆虫学报,(4):1190-1197.
(Li X H. 2013. Using "random forest" for classification and regression. Chinese Journal of Applied Entomology, 50(4):1190-1197.[in Chinese])
柳生吉,杨健. 2013. 基于广义线性模型和最大熵模型的黑龙江省林火空间分布模拟. 生态学杂志, 32 (6):1620-1628.
(Liu S J, Yang J. 2013. Modeling spatial patterns of forest fire in Heilongjiang Province using generalized linear model and maximum entropy model. Chinese Journal of Ecology, 32(6):1620-1628.[in Chinese])
毛光伶.1988. 林火与气象条件相互关系及其预报. 气象, 14(9):52-54.
(Mao G L. 1988. The relationship between forest fire and weather conditions and forecasts. Atmosphere, 14(9):52-54.[in Chinese])
彭国兰. 2007. 随机森林在企业信用评估中的应用. 厦门:厦门大学硕士学位论文.
(Peng G L. 2007. Application of Random Forests to Enterprises Credit Assessment. Xiamen:MS thesis of Xiamen University.[in Chinese])
彭欢,史明昌,孙瑜, 等. 2014. 基于Logistic的大兴安岭雷击火预测模型. 东北林业大学学报,42 (7):166-169.
(Peng H, Shi M C, Sun Y, et al. 2014. Lightning fire forecasting model of Daxing'an Mountain based on Logistic model. Journal of Northeast Forestry University, 42(7):166-169.[in Chinese])
孙瑜,史明昌,彭欢,等. 2014. 基于MAXENT模型的黑龙江大兴安岭森林雷击火火险预测. 应用生态学报,25(4):1100-1106.
(Sun Y, Shi M C, Peng H, et al. 2014. Forest lighting fire forecasting for Daxing'anling Mountains based on MAXENT model. Chinese Journal of Applied Ecology, 25 (4):1100-1106.[in Chinese])
田晓瑞,McRae D J,金继忠,等. 2010.大兴安岭地区森林火险变化及FWI适用性评估.林业科学,46 (5):127-132.
(Tian X R, McRae D J, Jin J Z, et al. 2010. Changes of forest fire danger and the evaluation of the FWI system application in the Daxing'anling region. Scientia Silvae Sinicae,46 (5):127-132.[in Chinese])
王明玉,舒立福,田晓瑞,等. 2003.林火在空间上的波动性及其对全球变化的响应(Ⅱ).火灾科学,12 (3):171-176.
(Wang M Y, Shu L F, Tian X R, et al. 2003. Spatial fluctuation of forest fires and their response to global change. Fire Safety Science, 12 (3):171-176.[in Chinese])
武晓岩,李康. 2006. 基因表达数据判别分析的随机森林方法. 中国卫生统计,23(6):491-494.
(Wu X Y, Li K. 2006. The application of random forests for the classification of gene expression data. Chinese Journal of Health Statistics, 23(6):491-494.[in Chinese])
武晓岩,闫晓光,李康. 2007. 基因表达数据的随机森林逐步判别分析方法. 中国卫生统计,24(2):151-154.
(Wu X Y, Yan X G, Li K. 2007. The stepwise discriminant analysis of random forests used in gene expression data. Chinese Journal of Health Statistics, 24(2):151-154.[in Chinese])
杨沐晞. 2012. 基于随机森林模型的二手房价格评估研究. 长沙:中南大学硕士学位论文.
(Yang M X. 2012. The price evaluation research of second-hand house based on the random forest model. Changsha:MS thesis of Central South University.[in Chinese])
姚登举,杨静,詹晓娟. 2014. 基于随机森林的特征选择算法. 吉林大学学报:工学版,44(1):137-141.
(Yao D J, Yang J, Zhan X J. 2014. Feature selection algorithm based on random forest. Journal of Jilin University:Engineering and Technology Edition, 44(1):137-141.[in Chinese])
张雷,王琳琳,张旭东,等. 2014. 随机森林算法基本思想及其在生态学中的应用——以云南松分布模拟为例. 生态学报,34(3):650-659.
(Zhang L, Wang L L, Zhang X D, et al. 2014. The basic principle of random forest and its applications in ecology:a case study of Pinus yunnanensis. Acta Ecologica Sinica, 34(3):650-659.[in Chinese])
张巍. 2009.森林定位观测与森林火险预警建设——大兴安岭国家森林生态观测站为例. 内蒙古农业大学学报,30 (1):127-131.
(Zhang W. 2009. The relationship between forest fire danger forecast and local forest observation case on Daxing'anling Mountain of Inner Mongolia local observation station. Journal of Inner Mongolia Agricultural University,30 (1):127-131.[in Chinese])
赵凤君,王明玉,舒立福,等. 2009. 气候变化对林火动态的影响研究进展. 气候变化研究进展,5 (1):50-55.
(Zhao F J, Wang M Y, Shu L F, et al. 2009. Progress in studies on influences of climate change on forest fire regime.Advances in Climate Change Research, 5 (1):50-55.[in Chinese])
朱沛林,史明昌,Mike Wotton,等. 2014. 黑龙江大兴安岭雷击火概率预测模型研究. 中南林业科技大学学报,34 (8):82-85.
(Zhu P L, Shi M C, Wotton M, et al. 2014. A preliminary study on lightning-caused fire probability prediction model for Daxing'anling forest region. Journal of Central South University of Forestry & Technology,34 (8):82-85.[in Chinese])
Chuvieco E,Giglio L, Justice C. 2008. Global characterization of fire activity:towards defining fire regimes from earth observation data. Global Change Biology, 14(7):1488-1502.
Chas-Amil M L,Prestemon J P, McClean C J,et al. 2015. Human-ignited wildfire patterns and responses to policy shifts. Applied Geography, 56:164-176.
Chang Y,Zhu Z L, Bu R C,et al. 2013. Predicting fire occurrence patterns with logistic regression in Heilongjiang Province,China. Landscape Ecology,28(10):1989-2004.
Cutler D R,Edwards T J,Beard K H,et al. 2007. Random forests for classification in ecology. Ecology,88(11):2783-2792.
Girardin M P,Ali A A,Carcaillet C,et al. 2013. Fire in managed forests of eastern Canada:Risks and options. Forest Ecology and Management, 258(3):238-249.
Genuer R,Poggi J,Tuleau-Malot C. 2010. Variable selection using random forests. Pattern Recognition Letters, 31(14):2225-2236.
Guo F T, Innes J L, Wang G Y, et al. 2015. Historic distribution and driving factors of human-caused fires in the Chinese boreal forest between 1972 and 2005. Journal of Plant Ecology,8(5):480-490.
Liaw A,Wiener M. 2002. Classification and regression by random forest. R news, (2):18-22.
NIFC(Naitional Interagency Fire Center). 2004. Urban-wild-land and wildland fire statistics. National Interagency Fire Center,Boise,Idaho,USA.
Oliveira S,Oehler F,San-Miguel-Ayanz J,et al. 2012. Modeling spatial patterns of fire occurrence in Mediterranean Europe using Multiple Regression and Random Forest. Forest Ecology and Management, 275(4):117-129.
Prasad A M,Iverson L R,Liaw A. 2006. Newer classification and regression tree techniques:Bagging and random forests for ecological prediction. Ecosystems,9(2):181-199.
Rodrigues M,de la Riva J. 2014a. An insight into machine-learning algorithms to model human-caused wildfire occurrence. Environmental Modelling & Software,57:192-201.
Rodrigues M,de la Riva J, Fotheringham S. 2014b. Modeling the spatial variation of the explanatory factors of human-caused wildfires in Spain using geographically weighted logistic regression. Applied Geography, 48:52-63.
Saefuddin A,Setiabudi N A,Fitrianto A. 2012. On comparison between logistic regression and geographically weighted logistic regression:with application to Indonesian poverty data. World Applied Sciences Journal, 19(2):205-210.
Stum A K,Boettinger J L, White M A, et al. 2010. Random forests applied as a soil spatial predictive model in Arid Utah//Digital Soil Mapping.Progress in Soil Science Vol 2.Springer Netherlands,179-190.
Zhong M H,Fan W C,Liu T M,et al. 2003. Statistical analysis on current status of China forest fire safety. Fire Safety Journal, 38:257-269.

[1]	傅伟聪, 朱志鹏, 陈梓茹, 黄淑萍, 王敏华, 丁国昌, 董建文. 千岛湖国家森林公园大气能见度变化特征及其影响因素[J]. 林业科学, 2018, 54(1): 22-31.
[2]	邱帅, 沈柏春, 李婷婷, 郭娟, 王霁, 孙丽娜, 陈徐平, 胡绍庆. 基于随机森林算法和SRAP分子标记的桂花品种鉴定方法[J]. 林业科学, 2018, 54(1): 32-45.
[3]	宁虎森, 罗青红, 吉小敏, 朱雅娟, 孙慧瑛, 褚玲. 3种沙地植物光合碳同化对环境因子的生理响应[J]. 林业科学, 2014, 50(9): 173-179.
[4]	彭小平, 樊军, 米美霞, 薛智德. 黄土高原水蚀风蚀交错区不同立地条件下旱柳树干液流差异[J]. 林业科学, 2013, 49(9): 38-45.
[5]	郑琼;邸雪颖;金森. 伊春地区1980-2010年森林火灾时空格局及影响因子[J]. 林业科学, 2013, 49(4): 157-163.
[6]	章一巧, 刘永华, 宗世祥, 陆鹏飞, 齐连珍, 骆有庆. 基于GS+的大黄枯叶蛾卵的空间分布关系[J]. 林业科学, 2013, 49(10): 100-105.
[7]	申卫星;郭慧玲;迟元凯;谭亚军;刘会香;黄大卫. 美国白蛾在泰山的适生性分析[J]. 林业科学, 2012, 48(6): 165-169.
[8]	孙龙;尚喆超;胡海清. Poisson回归模型和负二项回归模型在林火预测领域的应用[J]. 林业科学, 2012, 48(5): 126-129.
[9]	李晓彬;汪星;汪有科;张平. 梨枣茎直径微变化的气象因子[J]. 林业科学, 2012, 48(1): 173-180.
[10]	丁访军;王兵;赵广东. 毛竹树干液流变化及其与气象因子的关系[J]. 林业科学, 2011, 47(7): 73-81.
[11]	陈顺立;杜瑞卿;余培旺;范正章. 武夷山风景区萧氏松茎象的发生与环境因素的综合相关分析[J]. 林业科学, 2011, 47(2): 89-94.
[12]	朱万泽;范建容;彭建国;杨洪彬;杨本年;何明波. 四川省油橄榄引种品种果实含油率及其脂肪酸分析[J]. 林业科学, 2010, 46(8): 91-100.
[13]	于占辉. 陈云明杜盛. 黄土高原半干旱区人工林刺槐展叶期树干液流动态分析*[J]. 林业科学, 2009, 12(4): 53-59.
[14]	谭著明<sup></sup>张灿明<sup></sup>柏方敏<sup></sup>李锡泉<sup></sup>李有志<sup></sup>申爱荣<sup></sup>袁穗波<sup></sup>陈红长<sup></sup>左家哺<sup></sup>杨正洪<sup></sup>. 冰雪致湖南森林毁损原因、损失评估及重建设想[J]. 林业科学, 2008, 44(11): 91-96.
[15]	管伟熊伟王彦辉于澎涛何常清杜阿朋刘海龙. 六盘山北侧华北落叶松树干直径生长变化及其对环境因子的响应^*[J]. 林业科学, 2007, 43(09): 1-6.