基于ConvNeXt的北京地区红外相机野生动物图像识别改进模型构建

doi:10.11707/j.1001-7488.LYKX20230276

Abstract

Abstract:

Objective: Aiming at the problems of large amount of data, high proportion of invalid images, and complex image backgrounds in wild animal images captured by infrared cameras, a model that can automatically and accurately recognize images is proposed, providing more efficient support for biodiversity research and wildlife conservation work. Method: Collect and organize approximately 5 TB of image data captured by infrared cameras at various stations of the Beijing Ecological Observatory Network over the past 4 years. After manual annotation and data augmentation, create a total of 4234 image datasets in 10 categories. Based on ConvNeXt convolutional neural network and combined with the characteristics of wild animal image datasets in Beijing, a BSGG-ConvNeXt model was designed. BlurPool, SENet, global response normalization layer (GRN), and GCNet were used to improve the recognition ability of the model. The impact of training strategies on the recognition accuracy of ConvNeXt network was explored on a self-built dataset. By comparing with other classic models, the advantages of the BSGG-ConvNeXt model are clarified. Verify the generalization ability of the model using publicly available infrared wildlife snapshot serengeti (SS) dataset andcaltech camera traps (CCT) dataset. Result: Taking the ConvNeXt size model of the ConvNeXt model as an example, the accuracy in the self-built dataset is 74.13%, and the multiply add cumulative operands (MACs) are 4.47×10⁹. By applying different improvement schemes, it was found that the accuracy increased by 2.2% and MACs decreased to 1.07×10⁹ after using BlurPool. After using SENet, the accuracy improved by 3.2%. After using GRN and removing the scaling layer, the accuracy improved to 87.18% and the number of parameters increased to 27.88×10⁶. After using GCNet, the accuracy was improved to 75.44% without increasing the computational load, but the number of parameters increased to 28.25×10⁶. The BSGG-ConvNeXt obtained by combining the above improvement schemes is applied to the ConvNeXt-T model to obtain the BSGG-ConvNeXt-T model. Although there is a slight increase in the number of parameters, the MACs are reduced to 1.07×10⁹, and the accuracy of the model is improved to 83.63%, which is higher than the original model. After using pre-trained weights, the accuracy of the BSGG-ConvNeXt-T model can reach 94.07%, which is higher than the accuracy of ResNet-50 (76.39%), ResNeXt-50 (87.60%), MobileViT (90.00%), DenseNet (87.66%), RegNet (69.90%), ConvNeXtv2 (91.93%), SwinTransformer (86.23%), and MobileOne (71.53%) models. After applying the BSGG-ConvNeXt model to four different network sizes of ConvNeXt models, its performance in the self-built dataset is better than that of the unimproved model. The recognition accuracy of the BSGG-ConvNeXt model in the SS dataset can reach 50.28%, and the recognition accuracy in the CCT dataset can reach 56.15%, both of which are higher than the accuracy of the original model. Conclusion: The BSGG-ConvNeXt model has a higher accuracy in recognizing wild animal images captured by infrared cameras, and performs well on both self built and publicly available wild animal infrared image datasets, with a certain degree of generalization ability.

Key words: wildlife, image recognition, deep learning, convolutional neural network, ConvNeXt

CLC Number:

TP391.4

Jiandong Qi,Shangzi Zheng,Ziyi Chen,Zhongtian Ma. Wildlife Image Recognition of Infrared Cameras in Beijing Area Based on an Improvement ConvNeXt Model[J]. Scientia Silvae Sinicae, 2024, 60(8): 33-45.

Figures/Tables 17

Fig.1

Fig.2

Table 1

Table 2

Table 3

Fig.3

Fig.4

Fig.5

Fig.6

Fig.7

Fig.8

Table 4

Table 5

Fig.9

Table 6

Table 7

Fig.10

References 0

	何　嘉. 2019. 基于深度学习的野生动物智能检测与识别. 深圳: 深圳大学.
	He J. 2019. Wildlife smart detection and recognition based on deep learning. Shenzhen : Shenzhen University. ［in Chinese］
	邱志斌, 石大寨, 况燕军, 等. 基于深度迁移学习的输电线路涉鸟故障危害鸟种图像识别. 高电压技术, 2021, 47 (11): 3785- 3794.
	Qiu Z B, Shi D Z, Kuang Y J, et al. Image recognition of harmful bird species related to transmission line outages based on deep transfer learning. High Voltage Engineering, 2021, 47 (11): 3785- 3794.
	汪国海, 李生强, 施泽攀, 等. 广西猫儿山自然保护区的兽类和鸟类多样性初步调查——基于红外相机监测数据. 兽类学报, 2016, 36 (3): 338- 347.
	Wang G H, Li S Q, Shi Z P, et al. Preliminary survey of mammal and bird diversity of Guangxi Mao’ershan National Nature Reserve: based on infrared camera monitoring. Acta Theriologica Sinica, 2016, 36 (3): 338- 347.
	杨铭伦, 张　旭, 郭　颖, 等. 基于YOLOv5的红外相机野生动物图像识别. 激光与光电子学进展, 2022, 59 (12): 382- 390.
	Yang M L, Zhang X, Guo Y, et al. Recognition of wild animals using infrared camera images based on YOLOv5. Laser & Optoelectronics Progress, 2022, 59 (12): 382- 390.
	袁东芝. 2018. 基于卷积神经网络的动物识别算法研究. 广州: 华南理工大学.
	Yuan D Z. 2018. Research on animal recognition algorithm based on convolutional neural network. Guangzhou: South China University of Technology. ［in Chinese］
	于莉莉. 2017. 陆生野生动物保护对生物多样性的影响机理及对策. 南京: 南京林业大学.
	Yu L L. 2017. Effects of terrestrial wildlife conservation on biodiversity and countermeasures. Nanjing: Nanjing Forestry University. ［in Chinese］
	张　毓, 高雅月, 常峰源, 等. 小样本条件下基于数据扩充和ResNeSt的雪豹识别. 北京林业大学学报, 2021, 43 (10): 89- 99. doi: 10.12171/j.1000-1522.20210185
	Zhang Y, Gao Y Y, Chang F Y, et al. Panthera unica recognition based on data expansion and ResNeSt with few samples. Journal of Beijing Forestry University, 2021, 43 (10): 89- 99. doi: 10.12171/j.1000-1522.20210185
	Beery S, Van Horn G, Perona P. 2018. Recognition in terra incognita. Proceedings of the European Conference on Computer Vision (ECCV). Cham: Springer, 472−489.
	Brock A, De S, Smith S L, et al. 2021. High-performance large-scale image recognition without normalization. Proceedings of the 38th International Conference on Machine Learning Research (PMLR), 1059−1071.
	Chen G B, Han T X, He Z H, et al. Deep convolutional neural network based species recognition for wild animal monitoring. 2014 IEEE International Conference on Image Processing (ICIP). Paris, 2014, France, 858- 862.
	Ding X H, Zhang X Y, Han J G, et al. Scaling up your kernels to 31 × 31: revisiting large kernel design in CNNs. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, 2022, LA, USA,11953- 11965.
	Dosovitskiy A, Beyer L, Kolesnikov A, et al. 2020. An image is worth 16 × 16 words: transformers for image recognition at scale. arXiv: 2010.11929.
	Girshick R, Donahue J, Darrell T, et al. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 580-587.
	Gomez Villa A, Salazar A, Vargas F. Towards automatic wild animal monitoring: Identification of animal species in camera-trap images using very deep convolutional neural networks. Ecological Informatics, 2017, 41, 24- 32. doi: 10.1016/j.ecoinf.2017.07.004
	He K M, Zhang X Y, Ren S Q, et al. 2016. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV: USA,770−778.
	Howard A G, Zhu M L, Chen B, et al. 2017. MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv: 1704.04861.
	Hu J, Shen L, Sun G. Squeeze-and-excitation networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018, UT, USA,7132- 7141.
	Karanth K U. Estimating tiger Panthera tigris populations from camera-trap data using capture-recapture models. Biological Conservation, 1995, 71, 333- 338. doi: 10.1016/0006-3207(94)00057-W
	Kays R, McShea W J, Wikelski M. Born-digital biodiversity data: Millions and billions. Diversity and Distributions, 2020, 26 (5): 644- 648. doi: 10.1111/ddi.12993
	Krizhevsky A, Sutskever I, Hinton G E. 2012. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25.
	Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection. 2017 IEEE International Conference on Computer Vision (ICCV). Venice, 2017, Italy, 2999- 3007.
	Liu W, Anguelov D, Erhan D, et al. 2016. SSD: single shot MultiBox detector. European Conference on Computer Vision. Cham: Springer, 21−37.
	Liu Z, Lin Y T, Cao Y, et al. 2021. Swin Transformer: hierarchical vision transformer using shifted windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, QC: Canada,9992−10002.
	Liu Z, Mao H Z, Wu C Y, et al. 2022. A ConvNet for the 2020s. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA: USA,11966−11976.
	Niedballa J, Sollmann R, Mohamed A B, et al. Defining habitat covariates in camera-trap based occupancy studies. Scientific Reports, 2015, 5, 17041. doi: 10.1038/srep17041
	Norouzzadeh M S, Nguyen A, Kosmala M, et al. Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proceedings of the National Academy of Sciences of the United States of America, 2017, 115 (25): E5716- E5725.
	O’Connell A F, Nichols J D, Karanth K U. 2011. Camera traps in animal ecology: methods and analyses. Springer, New York.
	Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (6): 1137- 1149. doi: 10.1109/TPAMI.2016.2577031
	Schneider S, Greenberg S, Taylor G W, et al. Three critical factors affecting automated image species recognition performance for camera traps. Ecology and Evolution, 2020, 10 (7): 3503- 3517. doi: 10.1002/ece3.6147
	Simonyan K, Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556.
	Swanson A, Kosmala M, Lintott C, et al. Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna. Scientific Data, 2015, 2, 150026. doi: 10.1038/sdata.2015.26
	Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, 2015, MA, USA,1- 9.
	Tan M X, Le Q V. 2019. EfficientNet: rethinking model scaling for convolutional neural networks. International Conference on Machine Learning, 6105-6114.
	Van Horn G, Mac Aodha O, Song Y, et al. 2018. The iNaturalist species classification and detection dataset. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, UT. IEEE, 132−139.
	Vecvanags A, Aktas K, Pavlovs I, et al. Ungulate detection and species classification from camera trap images using RetinaNet and faster R-CNN. Entropy, 2022, 24 (3): 353. doi: 10.3390/e24030353
	Wang M J, Li Y D, Zhou J, et al. 2023. GCNet: probing self-similarity learning for generalized counting network. arXiv: 2302.05132.
	Wang X L, Girshick R, Gupta A, et al. Non-local neural networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018, UT, USA,7794- 7803.
	Woo S, Debnath S, Hu R H, et al. 2023. ConvNeXt V2: co-designing and scaling ConvNets with masked autoencoders. arXiv: 2301.00808.
	Xie S N, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, 2017, HI, USA,5987- 5995.
	Zhang R. Making convolutional networks shift-invariant again. International Conference on Machine Learning, 2019, 97, 7324- 7334.

动物种类Animal species	图像数量Image number
猪獾Arctonyx collaris	252
鸟（不含鸭类）Aves (not included mallard)	1 071
野猪Sus scrofa	112
豹猫Prionailurus bengalensis	119
鹿Cervus axis	1 199
山羊Capra hircus	332
野狗Canis lupus familiaris	105
野兔Lepus sinensis	126
松Sciurus vulgaris	300
鸭类Mallard	619
总计Total	4 234

SS数据集子集SS dataset subset		CCT数据集子集CCT dataset subset
动物种类Animal species	图像数量Image number	动物种类Animal species	图像数量Image number
转角牛羚Damaliscus lunatus	571	浣熊Procyon lotor	1 101
鸟类Aves	980	鸟类Aves	982
长颈鹿 Giraffa camelopardalis	1 000	狗Canis dingo	419
斑马Equus burchellii	1 000	啮齿动物（不含松鼠）Geomys bursarius (not included squirrel)	464
大羚羊Oryx	1 000	猫Prionailurus bengalensis	543
水牛Bubalus bubalus	1 000	鹿Cervus axis	1 256
警犬Canis lupus familiaris	1 000	郊狼Canis latrans	1 720
大象Elephas maximus	1 000	牛Bos taurus	332
珠鸡Numididae	1 000	野猫Prionailurus bengalensis	789
鬣狗Hyaenidae	1 000	松鼠Sciurus carolinesis	445
非洲旋角大羚羊Addax nasomaculatus	1 000	臭鼬Mephitis mephitis	180
黑斑羚Aepyceros melampus	1 000	狐狸Vulpes vulpes	239
瞪羚Gazella	1 000	野兔Lepus sinensis	1 237
黑尾牛羚Connochaetes taurinus	1 000	负鼠Didelphis virginiana	1 622
总计Total	13 551	总计Total	11 329

模型 Model	输入的通道数量 Number of input channels	重复堆叠次数 The number of times to repeat stacking
ConvNeXt-T	(96, 192, 384, 768)	(3, 3, 9, 3)
ConvNeXt-S	(96, 192, 384, 768)	(3, 3, 27, 3)
ConvNeXt-B	(128, 256, 512, 1 024)	(3, 3, 27, 3)
ConvNeXt-L	(192, 384, 768, 1 536)	(3, 3, 27, 3)
ConvNeXt-XL	(256, 512, 1 024, 2 048)	(3, 3, 27, 3)

方案号 Scheme No.	模型 Model	乘加累积操作数 MACs	参数数量 Params	准确率 Accuracy(%)
原始Oringinal	ConvNeXt-T	4.47×10⁹	27.83×10⁶	74.13
1	ConvNeXt-T+BP	1.07×10⁹	27.83×10⁶	76.39
2	ConvNeXt-T+SENet	4.47×10⁹	28.24×10⁶	77.34
3	ConvNeXt-T+GRN-缩放层 ConvNeXt-T+GRN-scale layer	4.47×10⁹	27.88×10⁶	87.18
4	ConvNeXt-T+GCNet	4.47×10⁹	28.25×10⁶	75.44
5	ConvNeXt-T+ BSGG-ConvNeXt	1.07×10⁹	28.71×10⁶	83.63

模型 Model	乘加累积操作数 MACs	参数数量 Params	准确率 Accuracy(%)
ConvNeXt-T	4.47×10⁹	27.83×10⁶	69.40
BSGG-ConvNeXt-T	1.07×10⁹	28.71×10⁶	83.63
ConvNeXt-S	8.7×10⁹	49.46×10⁶	73.43
BSGG-ConvNeXt-S	1.26×10⁹	51.08×10⁶	83.39
ConvNeXt-B	15.38×10⁹	87.68×10⁶	74.02
BSGG-ConvNeXt-B	2.21×10⁹	90.38×10⁶	82.70
ConvNeXt-L	34.4×10⁹	196.25×10⁶	78.13
BSGG-ConvNeXt-L	4.90×10⁹	202.42×10⁶	80.31

Wildlife Image Recognition of Infrared Cameras in Beijing Area Based on an Improvement ConvNeXt Model

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 17

References 0

Related Articles 15

Recommended Articles

Metrics

Comments

模型 Model	乘加累积操作数MACs	参数数量 Params	准确率 Accuracy(%)
ResNet-50	4.12×10⁹	23.54×10⁶	76.39
ResNeXt-50	4.27×10⁹	23.00×10⁶	87.60
MobileVIT	261.28×10⁶	954.23×10³	88.85
DenseNet	2.88×10⁹	6.69×10⁶	87.66
RegNet	503.13×10⁶	3.91×10⁶	69.70
EfficientNetv2	2.87×10⁹	343.05×10³	56.22
SwinTransformer	4.36×10⁹	27.53×10⁶	86.23
ConvNeXtv2	4.47×10⁹	27.87×10⁶	91.93
MobileOne	1.09×10⁹	4.28×10⁶	71.53
BSGG-ConvNeXt-T	1.07×10⁹	28.71×10⁶	83.63
MobileVIT+预训练权重 MobileVIT+pre-training weight	261.28×10⁶	954.23×10³	91.70
RegNet+预训练权重 RegNet +pre-training weight	503.13×10⁶	3.91×10⁶	93.20
BSGG-ConvNeXt-T+ 预训练权重 BSGG-ConvNeXt-T +pre-training weight	1.07×10⁹	28.71 ×10⁶	94.07

模型 Model	数据集 Dataset	乘加累积操作数MACs	参数数量 Params	准确率 Accuracy(%)
ConvNeXt-T	SS数据集子集 SS dataset subset	4.47×10⁹	27.83×10⁶	48.23
BSGG-ConvNeXt-T	SS数据集子集 SS dataset subset	1.07×10⁹	28.71×10⁶	50.28
ConvNeXt-T	CCT数据集子集 CCT dataset subset	4.47×10⁹	27.83×10⁶	45.75
BSGG-ConvNeXt-T	CCT数据集子集 CCT dataset subset	1.07×10⁹	28.71×10⁶	56.15

[1]	Changchun Zhang,Dafang Li,Junguo Zhang. Wildlife Images Recognition Method Based on Wasserstein Distance and Correlation Alignment Transfer Learning [J]. Scientia Silvae Sinicae, 2024, 60(8): 25-32.
[2]	Jingyi Xu,Zhi Zhang,Fei Yan,Wenyue Zhang. Leaf Identification Based on GAN-DCNN [J]. Scientia Silvae Sinicae, 2024, 60(4): 40-51.
[3]	Wenhan Yang,Tianyu Liu,Junchi Zhou,Wenwu Hu,Ping Jiang. CNN-Swin Transformer Detection Algorithm of Forest Wildlife Images Based on Improved YOLOv5s [J]. Scientia Silvae Sinicae, 2024, 60(3): 121-130.
[4]	Su Tong, Xu Jie. Tree Species Identification Method Based on Generative Adversarial Network [J]. Scientia Silvae Sinicae, 2024, 60(2): 97-105.
[5]	Jiandong Qi,Zhongtian Ma,Dehuai Zhang,Yun Tian. Wildlife Image Recognition in Miyun District Based on BS-ResNeXt-50 [J]. Scientia Silvae Sinicae, 2023, 59(8): 112-122.
[6]	Yujie Miao,Shiping Zhu,Jing Pu,Junxian Li,Lingkai Ma,Hua Huang. Recognition of Furniture Wood Image Species Based on Convolutional Neural Networks [J]. Scientia Silvae Sinicae, 2023, 59(8): 133-140.
[7]	Yingwu Mao,Ying Guo,Wangfei Zhang,Yong Su,Yuan Guan. Tree Species Classification by Combining LiDAR, Hyperspectral Data and 3D-CNN Method [J]. Scientia Silvae Sinicae, 2023, 59(3): 73-83.
[8]	Jiajie Su,Zheyu Zhang,Jiajun Xu,Bin Li,Jun Lü,Qing Yao. Forest Pest Identification Method Based on a Deep Bilinear Transformation Attention Mechanism Network [J]. Scientia Silvae Sinicae, 2023, 59(2): 121-128.
[9]	Junfeng Chen,Yi Xie. Wildlife Accident, Compensation for Damage Caused by Wildlife and Farmers’ Willingness to Protect Wildlife [J]. Scientia Silvae Sinicae, 2023, 59(12): 152-166.
[10]	Yuxuan Hu,Junfeng Chen,Yi Xie. Measures for Governing Human-Elephant Conflicts Based on Choice Experiment of Farmers in Xishuangbanna [J]. Scientia Silvae Sinicae, 2023, 59(10): 162-170.
[11]	Huimin Feng,Kun Jin. Voiceprint Recognition of Male Nomascus hainanus Based on Convolutional Neural Network [J]. Scientia Silvae Sinicae, 2023, 59(1): 119-127.
[12]	Jia Li,Lan Lan,Zuozhong Zhang,Wentao Yuan,Demin Gao,Shuqin Zong,Qiaolin Ye. Inversion Technology of Forest Fuel Moisture Content Based on Deep Learning [J]. Scientia Silvae Sinicae, 2022, 58(10): 47-58.
[13]	Tuo He,Shoujia Liu,Yang Lu,Yonggang Zhang,Lichao Jiao,Yafang Yin. iWood: An Automated Wood Identification System for Endangered and Precious Tree Species Using Convolutional Neural Networks [J]. Scientia Silvae Sinicae, 2021, 57(9): 152-159.
[14]	Ziyu Zhao,Xiaoxia Yang,Hui Guo,Zhedong Ge,Yucheng Zhou. Recognition Method of Wood Macro- and Micro-Structure Based on Convolution Neural Network [J]. Scientia Silvae Sinicae, 2021, 57(6): 134-143.
[15]	Yan Zhou,Wenping Liu,Youqing Luo,Shixiang Zong. Small Object Detection for Infected Trees Based on the Deep Learning Method [J]. Scientia Silvae Sinicae, 2021, 57(3): 98-107.