Welcome to visit Scientia Silvae Sinicae,Today is

Scientia Silvae Sinicae ›› 2024, Vol. 60 ›› Issue (8): 33-45.doi: 10.11707/j.1001-7488.LYKX20230276

• Technology and application of smart forestry and grassland • Previous Articles     Next Articles

Wildlife Image Recognition of Infrared Cameras in Beijing Area Based on an Improvement ConvNeXt Model

Jiandong Qi1,2,Shangzi Zheng1,3,Ziyi Chen1,Zhongtian Ma1   

  1. 1. College of Information, Beijing Forestry University Beijing 100083
    2. Engineering Research Center for Forestry-Oriented Intelligent Information Processing of National Forestry and Grassland Administration Beijing 100083
    3. School of Artificial Intelligence, Tangshan University Tangshan 063000
  • Received:2023-07-01 Online:2024-08-25 Published:2024-09-03

Abstract:

Objective: Aiming at the problems of large amount of data, high proportion of invalid images, and complex image backgrounds in wild animal images captured by infrared cameras, a model that can automatically and accurately recognize images is proposed, providing more efficient support for biodiversity research and wildlife conservation work. Method: Collect and organize approximately 5 TB of image data captured by infrared cameras at various stations of the Beijing Ecological Observatory Network over the past 4 years. After manual annotation and data augmentation, create a total of 4234 image datasets in 10 categories. Based on ConvNeXt convolutional neural network and combined with the characteristics of wild animal image datasets in Beijing, a BSGG-ConvNeXt model was designed. BlurPool, SENet, global response normalization layer (GRN), and GCNet were used to improve the recognition ability of the model. The impact of training strategies on the recognition accuracy of ConvNeXt network was explored on a self-built dataset. By comparing with other classic models, the advantages of the BSGG-ConvNeXt model are clarified. Verify the generalization ability of the model using publicly available infrared wildlife snapshot serengeti (SS) dataset andcaltech camera traps (CCT) dataset. Result: Taking the ConvNeXt size model of the ConvNeXt model as an example, the accuracy in the self-built dataset is 74.13%, and the multiply add cumulative operands (MACs) are 4.47×109. By applying different improvement schemes, it was found that the accuracy increased by 2.2% and MACs decreased to 1.07×109 after using BlurPool. After using SENet, the accuracy improved by 3.2%. After using GRN and removing the scaling layer, the accuracy improved to 87.18% and the number of parameters increased to 27.88×106. After using GCNet, the accuracy was improved to 75.44% without increasing the computational load, but the number of parameters increased to 28.25×106. The BSGG-ConvNeXt obtained by combining the above improvement schemes is applied to the ConvNeXt-T model to obtain the BSGG-ConvNeXt-T model. Although there is a slight increase in the number of parameters, the MACs are reduced to 1.07×109, and the accuracy of the model is improved to 83.63%, which is higher than the original model. After using pre-trained weights, the accuracy of the BSGG-ConvNeXt-T model can reach 94.07%, which is higher than the accuracy of ResNet-50 (76.39%), ResNeXt-50 (87.60%), MobileViT (90.00%), DenseNet (87.66%), RegNet (69.90%), ConvNeXtv2 (91.93%), SwinTransformer (86.23%), and MobileOne (71.53%) models. After applying the BSGG-ConvNeXt model to four different network sizes of ConvNeXt models, its performance in the self-built dataset is better than that of the unimproved model. The recognition accuracy of the BSGG-ConvNeXt model in the SS dataset can reach 50.28%, and the recognition accuracy in the CCT dataset can reach 56.15%, both of which are higher than the accuracy of the original model. Conclusion: The BSGG-ConvNeXt model has a higher accuracy in recognizing wild animal images captured by infrared cameras, and performs well on both self built and publicly available wild animal infrared image datasets, with a certain degree of generalization ability.

Key words: wildlife, image recognition, deep learning, convolutional neural network, ConvNeXt

CLC Number: