基于改进YOLOv5s的CNN-Swin Transformer森林野生动物图像目标检测算法

doi:10.11707/j.1001-7488.LYKX20220597

Abstract

Abstract:

Objective: To improve the detection accuracy of wildlife in complex forest environments and advance the development of forest wildlife conservation technology, an improved detection algorithm based on the YOLOv5s network model is proposed for forest wildlife images taken by trap cameras in this study. Method: A dataset containing several typical forest wildlife in the Huping Mountain National Nature Reserve in Hunan was used as the research object. Firstly, image enhancement was performed by cropping, normalizing and scaling the ground truth box images, and then two to four cropped images were randomly collaged to form new dataset elements to enrich and enhance the dataset image information. Secondly, a weighted channel stitching method based on the idea of channel attention was used, specifically, the number of channels was changed by introducing weights in channel stitching, and the weights was continuously updated to increase the number of channel layers with important feature information by a back-propagation training method. Then, the Swin Transformer module was introduced to and combined with the CNN network to add a self-attentive mechanism to the convolutional neural network feature extraction, which integrated the advantages of the feature extraction layers of both networks and improved the perceptual field of feature extraction. Finally, a better α-DIoU loss function was chosen to replace the GIoU loss function, and a new geometric factor penalty term was introduced to address the loss caused by the overlapping area of the bounding box and the distance of the centroid. Result: Under the same experimental conditions with the same data set, compared with the original YOLOv5s network model, the improved algorithm greatly improved the average accuracy and average regression rate of detection, increased the mean average precision (mAP) from 74.1% to 88.4%, obtained an accuracy improvement of 14.3%, and also outperformed other popular target detection algorithms such as YOLOv3, YOLOXs, RetinaNet and Faster R-CNN. Conclusion: The low contrast between background and target of forest wildlife images taken by trap cameras and serious overlap of occlusion result in high detection false detection rate and leakage rate. To address those problems, in this study a series of improvement measures have been proposed in the detection algorithm, which provides a new feasible solution and idea for the protection and data acquisition of forest wildlife in China.

Key words: forest wildlife, detection algorithm, YOLOv5s, Swin Transformer, network convergence

CLC Number:

Wenhan Yang,Tianyu Liu,Junchi Zhou,Wenwu Hu,Ping Jiang. CNN-Swin Transformer Detection Algorithm of Forest Wildlife Images Based on Improved YOLOv5s[J]. Scientia Silvae Sinicae, 2024, 60(3): 121-130.

Figures/Tables 13

Fig.1

Fig.2

Fig.3

Fig.4

Fig.5

Fig.6

Table 1

Fig.7

Table 2

Fig.8

Table 3

Fig.9

Table 4

References 0

	李　佳, 刘　芳, 李迪强, 等. 2017. 基于红外相机监测分析的红腹角雉日活动节律. 林业科学, 53(7): 170−176.
	Li J, Liu F, Li D Q, et al. 2017. Daily activity rhythm of Temminick’s Tragopan (Trgopan temminckii) based on infrared camera monitoring. Journal of Computer Applications. ［in Chinese］
	李　果, 李俊生, 关　潇, 等. 2014. 生物多样性监测技术手册. 北京: 中国环境科学出版社.
	Li G, Li J S, Guan X, et al. 2014. Biodiversity monitoring technical manuals. Beijing: China Environmental Science Press. ［in Chinese］
	Ali H, Vishwesh N, Yucheng T, et al. 2022. Swin unetr: swin transformers for semantic segmentation of brain tumors in mri images. International MICCAI Brainlesion Workshop, Springer, 272–284.
	Bochkovskiy A, Wang C Y, Liao H Y M. 2020. YOLOv4: optimal speed and accuracy of object detection. arXiv Preprint arXiv: 2004.10934.
	Chen R, Little R, Mihaylova L, et al. Forest wildlife surveillance using deep learning methods. Ecology and Evolution, 2019, 9 (17): 9453- 9466. doi: 10.1002/ece3.5410
	Chen G, Han T X, He Z, et al. 2014. Deep convolutional neural network based species recognition for wild animal monitoring. 2014 IEEE International Conference on Image Processing (ICIP), IEEE, 858−862.
	Carion N, Massa F, Synnaeve G, et al. 2020. End-to-end object detection with transformers. European Conference on Computer Vision (ECCV), 12346: 213–229.
	DeVries T, Taylor G W. 2017. Improved regularization of convolutional neural networks with cutout. arXiv Preprint arXiv: 1708.04552.
	Dosovitskiy A, Beyer L, Kolesnikov A, et al. 2020. An image is worth 16×16 words: transformers for image recognition at scale. arXiv Preprint arXiv: 2010.11929.
	Girshick R. 2015. Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision, 1440−1448.
	Girshick R, Donahue J, Darrell T, et al. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 580−587.
	He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37 (9): 1904- 1916. doi: 10.1109/TPAMI.2015.2389824
	He J, Erfani S, Ma X, et al. 2021. Alpha-IoU: a family of power intersection over union losses for bounding box regression. arXiv Preprint arXiv: 2110.13675.
	Han K, Wang Y, Chen H, et al. A survey on vision transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45 (1): 87- 110.
	Jannat F E, Willis A R. 2022. Improving classification of remotely sensed images with the Swin transformer. SoutheastCon 2022, IEEE, 611–618.
	Li Y, Mao H, Girshick R, et al. 2022. Exploring plain vision transformer backbones for object detection. Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022. Proceedings, Part IX. Cham: Springer Nature Switzerland, 280−296.
	Lin T Y, Goyal P, Girshick R, et al. 2017. Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, 2980−2988.
	Lin T Y, Maire M, Belongie S, et al. 2014. Microsoft coco: common objects in context. European Conference on Computer Vision, Springer, 740–755.
	Liu T, Ma Y, Yang W, et al. Spatial-temporal interaction learning based two-stream network for action recognition. Information Sciences, 2022, 606, 864- 876.
	Liu W, Anguelov D, Erhan D, et al. Ssd: single shot multibox detector. European Conference on Computer Vision. Springer, 2016, Cham, 21- 37.
	Liu Z, Tan Y, He Q, et al. Swinnet: Swin transformer drives edge-aware rgb-d and rgb-t salient object detection. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 32 (7): 4486- 4497.
	Liu Z, Lin Y, Cao Y, et al. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF International Conference on Computer Vision, 10012−10022.
	Naseer M M, Ranasinghe K, Khan S H, et al. Intriguing properties of vision transformers. Advances in Neural Information Processing Systems, 2021, 34, 23296- 23308.
	Khan S, Naseer M, Hayat M, et al. Transformers in vision: a survey. ACM Computing Surveys (CSUR), 2022, 54 (10s): 1- 41.
	Norouzzadeh M S, Nguyen A, Kosmala M, et al. Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proceedings of the National Academy of Sciences, 2018, 115 (25): E5716- E5725.
	Redmon J, Divvala S, Girshick R, et al. 2016. You only look once: unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779−788.
	Redmon J, Farhadi A. 2017. YOLO9000: better, faster, stronger. Proceedings of the IEEE Conference on Computer vision and Pattern Recognition. 7263−7271.
	Redmon J, Farhadi A. 2018. Yolov3: an incremental improvement. arXiv Preprint arXiv: 1804.02767.
	Ren S, He K, Girshick R, et al. 2015. Faster r-cnn: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 28.
	Sermanet P, Eigen D, Zhang X, et al. 2013. Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv Preprint arXiv: 1312.6229.
	Schneider T C, Kowalczyk R, Köhler M. 2013.Resting site selection by large herbivores―the case of European bison (Bison bonasus) in Biaowieza Primeval Forest. Mammalian Biology, 78(6): 438−445.
	Villa A G, Salazar A, Vargas F. Towards automatic wild animal monitoring: identification of animal species in camera-trap images using very deep convolutional neural networks. Ecological Informatics, 2017, 41, 24- 32. doi: 10.1016/j.ecoinf.2017.07.004
	Yun S, Han D, Oh S J, et al. 2019. Cutmix: Regularization strategy to train strong classifiers with localizable features. Proceedings of the IEEE/CVF International Conference on Computer Vision, 6023−6032.
	Zheng Z, Wang P, Liu W, et al. 2020. Distance-IoU loss: faster and better learning for bounding box regression. Proceedings of the AAAI Conference on Artificial Intelligence. 34(7): 12993−13000.

种类 Species	原图数量 Number of original images	融合图像数量 Number of image fusions	总数 Total
亚洲黑熊 Ursus thibetanus	525	525	10 072
豹猫 Prionailurus bengalensis	327	327
猕猴 Macaca mulatta	162	162
毛冠鹿 Elaphodus cephalophus	93	93
短尾猫 Lynx rufus	647	647
黑尾鹿 Odocoileus hemionus	1 065	1 065
浣熊 Procyon lotor	655	655
红松鼠 Tamiasciurus hudsonicus	804	804
赤狐 Vulpes vulpes	758	758

组别 Group	网络模型 Model	平均准确率 P	平均回归率 R	均值平均精度 mAP@0.5(%)	检测速度 Detection speed/FPS
1	YOLOv5s	0.83	0.66	74.1	45
2	YOLOv5s+CutMixE	0.81	0.68	76.7	45
3	YOLOv5s+CutMixE+ConcatE	0.85	0.74	80.4	45
4	YOLOv5s+CutMixE+SwinTR	0.83	0.75	84.5	36
5	YOLOv5s+CutMixE+ConcatE+SwinTR	0.85	0.82	86.8	36
6	YOLOv5s+CutMixE+ConcatE+SwinTR+α-DIoU	0.86	0.83	88.4	36

类别 Sort	YOLOv5s AP AP of YOLOv5s(%)	改进算法P P of improved algorithm	改进算法R R of improved algorithm	改进算法AP AP of improved algorithm (%)
亚洲黑熊Ursus thibetanus	83.0	0.95	0.96	93.6
豹猫Prionailurus bengalensis	69.4	0.88	0.84	85.9
猕猴Macaca mulatta	69.1	0.88	0.85	86.1
毛冠鹿Elaphodus cephalophus	68.8	0.72	0.82	80.4
短尾猫Lynx rufus	72.1	0.89	0.86	87.0
黑尾鹿Odocoileus hemionus	81.5	0.88	0.90	90.6
浣熊Procyon lotor	78.5	0.95	0.90	91.4
红松鼠Tamiasciurus hudsonicus	72.0	0.94	0.89	90.7
赤狐Vulpes vulpes	72.5	0.90	0.89	89.9

网络模型 Model	均值平均准确率 mAP@0.5(%)	检测速度 Detection speed/FPS	模型参数 Model size/ MB
YOLOv3	73.6	37	232.1
RetinaNet	72.8	36	46.3
Faster R-CNN	81.6	28	106.2
YOLOv5s	74.1	45	13.7
YOLOv5s-TR	86.3	23	15.1
改进算法 Improved algorithm	88.4	36	14.0

[1]	Yadong Xue,Diqiang Li,Jia Li. Habitat Selection and Migration Pattern of Wild Bactrian Camel (Camelus ferus) in the Kumtag Desert, China Based on Satellite Tracking and Positioning Technology [J]. Scientia Silvae Sinicae, 2020, 56(10): 192-198.
[2]	Jinhua Mo,Jia Li,Fang Liu,Xiaoguan Li,Diqiang Li. A Survey of Mammals and Birds Diversity in Jianfengling District of Hainan Province by Using Camera-Trapping [J]. Scientia Silvae Sinicae, 2019, 55(10): 203-210.
[3]	Kong Weiyao, Sun Quan, Liu Xinxin, Qu Li, Wang Fuyou, Yao Mingyuan, Zou Hongfei. Population Dynamic of Far Eastern Leopard(Panthera pardus orientalis) in Wangqing Nature Reserve Based on Infrared Camera Monitoring [J]. Scientia Silvae Sinicae, 2019, 55(5): 188-196.
[4]	Huang Heqing, Chu Hongjun, Cao Jie, Bu Lan, Hu Defu, Zhang Dong, Li Kai. Distribution of Gasterophilus (Diptera, Gasterophilidae) Myiasis Foci in Arid Desert Steppe:A Case Study of Kalamaili Mountain Ungulate Nature Reserve [J]. Scientia Silvae Sinicae, 2017, 53(11): 142-149.
[5]	Wang Wengting, Xiao Sa, Huang Heqing, Li Kai, Zhang Dong, Chu Hongjun, Guo Youqing, Gao Wanli. Diversity and Infection of Gasterophilus spp. in Mongol-Xinjiang Region and Qinghai Tibet Region [J]. Scientia Silvae Sinicae, 2016, 52(2): 134-139.
[6]	Wang Wenting, Zhang Dong, Hu Defu, Chu Hongjun, Cao Jie, Ge Yan, Aierken Jilili, Li Kai. Analysis of the Main Etiology of Gasterophilosis in Przewalski's Horse in Xinjiang [J]. Scientia Silvae Sinicae, 2014, 50(11): 90-95.
[7]	Sun Feixiang;Dang Kunliang;Chen Junxian. Relationship between Habitat Selection of Giant Panda and Forest Community Characters in Qinling Mountains [J]. , 2013, 49(5): 147-153.
[8]	Qi Lei;Hu Defu;Ding Changqing;Sui Jinling;Zhang Dong;Yang Liang;Wu Jigui;Jiang Wanjie. Rats Community Structure and Diversity in the Songshan National Nature Reserve, Beijing [J]. Scientia Silvae Sinicae, 2012, 48(9): 181-185.
[9]	Wang Yanying;Wang Cheng;Qie Guangfa;Dong Jianhua;Jiang Jihong. Effect of VOCs from Branch and Leaf of Platycladus orientalis on Locomotor Activity in Mice [J]. Scientia Silvae Sinicae, 2011, 47(12): 97-100.
[10]	Tie Jun;Zhang Jing;Peng Linpeng;Zhao Benyuan;Zhang Zhixiang;. Analysis of Main Factors Influencing Summer and Autumn Feeding of Rhinopithecus roxellana in Shennongjia Nature Reserve [J]. Scientia Silvae Sinicae, 2011, 47(7): 108-115.
[11]	Han Zongxian;Wang Wei;Hu Jinchu. Habitat Selection by Francois' Langur in Jinfo Mountain in Spring [J]. Scientia Silvae Sinicae, 2011, 47(4): 121-128.

CNN-Swin Transformer Detection Algorithm of Forest Wildlife Images Based on Improved YOLOv5s

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 13

References 0

Related Articles 11

Recommended Articles

Metrics

Comments