欢迎访问林业科学,今天是

林业科学 ›› 2024, Vol. 60 ›› Issue (8): 33-45.doi: 10.11707/j.1001-7488.LYKX20230276

• 前沿与重点:智慧林草技术与应用 • 上一篇    下一篇

基于ConvNeXt的北京地区红外相机野生动物图像识别改进模型构建

齐建东1,2,郑尚姿1,3,陈子仪1,马鐘添1   

  1. 1. 北京林业大学信息学院 北京 100083
    2. 国家林业和草原局林业智能信息处理工程技术研究中心 北京 100083
    3. 唐山学院人工智能学院 唐山 063000
  • 收稿日期:2023-07-01 出版日期:2024-08-25 发布日期:2024-09-03
  • 基金资助:
    国家重点研发计划项目“典型人工林生态系统对全球变化适应机制 ”(2020YFA0608100);国家自然科学基金项目“全球变化背景下人工林生态系统质量和稳定性综合评估”(32071842)。

Wildlife Image Recognition of Infrared Cameras in Beijing Area Based on an Improvement ConvNeXt Model

Jiandong Qi1,2,Shangzi Zheng1,3,Ziyi Chen1,Zhongtian Ma1   

  1. 1. College of Information, Beijing Forestry University Beijing 100083
    2. Engineering Research Center for Forestry-Oriented Intelligent Information Processing of National Forestry and Grassland Administration Beijing 100083
    3. School of Artificial Intelligence, Tangshan University Tangshan 063000
  • Received:2023-07-01 Online:2024-08-25 Published:2024-09-03

摘要:

目的: 针对红外相机拍摄的野生动物图像数据量大、无效图像占比多、图像背景复杂等问题,提出一种可对图像进行自动、高准确率识别的模型,为生物多样性研究和野生动物保护工作提供更高效的支持。方法: 收集整理近4年来北京园林绿化生态系统监测网络各站点红外相机拍摄的约5 TB图像数据,对其手工标注并进行数据增强后自建10类共4 234张图像数据集。基于ConvNeXt卷积神经网络,结合北京地区野生动物图像数据集特点,设计BSGG-ConvNeXt模型,使用BlurPool、SENet、全局响应归一化层(GRN)、GCNet提升模型识别能力,并在自建数据集上探究训练策略对ConvNeXt网络识别准确率的影响,通过与其他经典模型比较,明确BSGG-ConvNeXt模型的优势。利用公开的红外野生动物Snapshot Serengeti(SS)数据集和Caltech Camera Traps(CCT)数据集,验证模型的泛化能力。结果: 以ConvNeXt的ConvNeXt-T网络尺寸模型为例,其在自建数据集中的准确率为74.13%,乘加累积操作数(MACs)为4.47×109。应用不同改进方案发现,使用 BlurPool后准确率提升2.2%,MACs降至1.07×109;使用SENet后准确率提升3.2%;使用GRN并删掉缩放层后准确率升至87.18%,参数数量增至27.88×106;使用GCNet后在不增大计算量的情况下准确率升至75.44%,但参数数量增至28.25×106。将上述改进方案结合得到的BSGG-ConvNeXt应用于ConvNeXt-T模型获得BSGG-ConvNeXt-T模型,参数数量虽有少量增多,但MACs降为1.07×109,模型准确率升至83.63%,高于原模型。使用预训练权重后的BSGG-ConvNeXt-T模型准确率可达94.07%,高于ResNet-50(76.39%)、ResNeXt-50(87.60%)、MobileViT(90.00%)、DenseNet(87.66%)、RegNet(69.90%)、ConvNeXtv2(91.93%)、SwinTransformer的(86.23%)和MobileOne(71.53%),将BSGG-ConvNeXt模型应用于4种不同网络尺寸的ConvNeXt模型后,在自建数据集中的表现均优于未改进模型。BSGG-ConvNeXt模型在SS数据集中的识别准确率达50.28%,在CCT数据集中的识别准确率达56.15%,均高于原模型的准确率。结论: BSGG-ConvNeXt模型识别红外相机拍摄的野生动物图像准确率更高,在自建、公开的野生动物红外图像数据集上均有较好表现,且具有一定泛化能力。

关键词: 野生动物, 图像识别, 深度学习, 卷积神经网络, ConvNeXt

Abstract:

Objective: Aiming at the problems of large amount of data, high proportion of invalid images, and complex image backgrounds in wild animal images captured by infrared cameras, a model that can automatically and accurately recognize images is proposed, providing more efficient support for biodiversity research and wildlife conservation work. Method: Collect and organize approximately 5 TB of image data captured by infrared cameras at various stations of the Beijing Ecological Observatory Network over the past 4 years. After manual annotation and data augmentation, create a total of 4234 image datasets in 10 categories. Based on ConvNeXt convolutional neural network and combined with the characteristics of wild animal image datasets in Beijing, a BSGG-ConvNeXt model was designed. BlurPool, SENet, global response normalization layer (GRN), and GCNet were used to improve the recognition ability of the model. The impact of training strategies on the recognition accuracy of ConvNeXt network was explored on a self-built dataset. By comparing with other classic models, the advantages of the BSGG-ConvNeXt model are clarified. Verify the generalization ability of the model using publicly available infrared wildlife snapshot serengeti (SS) dataset andcaltech camera traps (CCT) dataset. Result: Taking the ConvNeXt size model of the ConvNeXt model as an example, the accuracy in the self-built dataset is 74.13%, and the multiply add cumulative operands (MACs) are 4.47×109. By applying different improvement schemes, it was found that the accuracy increased by 2.2% and MACs decreased to 1.07×109 after using BlurPool. After using SENet, the accuracy improved by 3.2%. After using GRN and removing the scaling layer, the accuracy improved to 87.18% and the number of parameters increased to 27.88×106. After using GCNet, the accuracy was improved to 75.44% without increasing the computational load, but the number of parameters increased to 28.25×106. The BSGG-ConvNeXt obtained by combining the above improvement schemes is applied to the ConvNeXt-T model to obtain the BSGG-ConvNeXt-T model. Although there is a slight increase in the number of parameters, the MACs are reduced to 1.07×109, and the accuracy of the model is improved to 83.63%, which is higher than the original model. After using pre-trained weights, the accuracy of the BSGG-ConvNeXt-T model can reach 94.07%, which is higher than the accuracy of ResNet-50 (76.39%), ResNeXt-50 (87.60%), MobileViT (90.00%), DenseNet (87.66%), RegNet (69.90%), ConvNeXtv2 (91.93%), SwinTransformer (86.23%), and MobileOne (71.53%) models. After applying the BSGG-ConvNeXt model to four different network sizes of ConvNeXt models, its performance in the self-built dataset is better than that of the unimproved model. The recognition accuracy of the BSGG-ConvNeXt model in the SS dataset can reach 50.28%, and the recognition accuracy in the CCT dataset can reach 56.15%, both of which are higher than the accuracy of the original model. Conclusion: The BSGG-ConvNeXt model has a higher accuracy in recognizing wild animal images captured by infrared cameras, and performs well on both self built and publicly available wild animal infrared image datasets, with a certain degree of generalization ability.

Key words: wildlife, image recognition, deep learning, convolutional neural network, ConvNeXt

中图分类号: