基于改进YOLOv5s的CNN-Swin Transformer森林野生动物图像目标检测算法

doi:10.11707/j.1001-7488.LYKX20220597

摘要/Abstract

摘要：

目的: 为提高野生动物在复杂森林环境中的检测精度，促进森林野生动物保护技术发展，提出一种基于YOLOv5s网络模型、针对陷阱相机所摄取森林野生动物图像的改进检测算法。方法: 以包含湖南壶瓶山国家级自然保护区几种典型森林野生动物在内的数据集为研究对象，首先，对真实标注框图像进行裁剪、归一化和缩放处理，随机将2~4张裁剪图像拼贴组成新的数据集元素，以丰富和增强数据集图像信息；其次，使用一种基于通道注意力思想的加权通道拼接方法，在通道拼接时引入权重改变通道数量，通过反向传播训练方法不断更新权重以增加重要特征信息的通道层数；接着，引入Swin Transformer模块与CNN网络相结合，为卷积神经网络特征提取加入自注意力机制，融合2种网络特征提取层的优势，提高特征提取的感受野；最后，选择更优的α-DIoU损失函数替代GIoU损失函数，针对边界框重叠面积和中心点距离造成的损失，引入新的几何因素惩罚项。结果: 在相同试验条件和数据集下，相比原YOLOv5s网络模型，改进算法极大提高检测的平均准确率和平均回归率，均值平均精度由74.1%提升至88.4%，获得14.3%的精度提升，同时也超过YOLOv3、YOLOXs、RetinaNet、Faster R-CNN等其他流行目标检测算法。结论: 针对陷阱相机所摄取森林野生动物图像背景与目标对比度低、遮挡重叠严重，致使检测误检率、漏检率高等问题，在检测算法中提出一系列改进措施，为我国森林野生动物的保护和数据获取提供一种新的可行性方案和思路。

关键词: 森林野生动物, 检测算法, YOLOv5s, Swin Transformer, 网络融合

Abstract:

Objective: To improve the detection accuracy of wildlife in complex forest environments and advance the development of forest wildlife conservation technology, an improved detection algorithm based on the YOLOv5s network model is proposed for forest wildlife images taken by trap cameras in this study. Method: A dataset containing several typical forest wildlife in the Huping Mountain National Nature Reserve in Hunan was used as the research object. Firstly, image enhancement was performed by cropping, normalizing and scaling the ground truth box images, and then two to four cropped images were randomly collaged to form new dataset elements to enrich and enhance the dataset image information. Secondly, a weighted channel stitching method based on the idea of channel attention was used, specifically, the number of channels was changed by introducing weights in channel stitching, and the weights was continuously updated to increase the number of channel layers with important feature information by a back-propagation training method. Then, the Swin Transformer module was introduced to and combined with the CNN network to add a self-attentive mechanism to the convolutional neural network feature extraction, which integrated the advantages of the feature extraction layers of both networks and improved the perceptual field of feature extraction. Finally, a better α-DIoU loss function was chosen to replace the GIoU loss function, and a new geometric factor penalty term was introduced to address the loss caused by the overlapping area of the bounding box and the distance of the centroid. Result: Under the same experimental conditions with the same data set, compared with the original YOLOv5s network model, the improved algorithm greatly improved the average accuracy and average regression rate of detection, increased the mean average precision (mAP) from 74.1% to 88.4%, obtained an accuracy improvement of 14.3%, and also outperformed other popular target detection algorithms such as YOLOv3, YOLOXs, RetinaNet and Faster R-CNN. Conclusion: The low contrast between background and target of forest wildlife images taken by trap cameras and serious overlap of occlusion result in high detection false detection rate and leakage rate. To address those problems, in this study a series of improvement measures have been proposed in the detection algorithm, which provides a new feasible solution and idea for the protection and data acquisition of forest wildlife in China.

Key words: forest wildlife, detection algorithm, YOLOv5s, Swin Transformer, network convergence

中图分类号:

杨文翰,刘天宇,周俊池,胡文武,蒋蘋. 基于改进YOLOv5s的CNN-Swin Transformer森林野生动物图像目标检测算法[J]. 林业科学, 2024, 60(3): 121-130.

Wenhan Yang,Tianyu Liu,Junchi Zhou,Wenwu Hu,Ping Jiang. CNN-Swin Transformer Detection Algorithm of Forest Wildlife Images Based on Improved YOLOv5s[J]. Scientia Silvae Sinicae, 2024, 60(3): 121-130.

图/表 13

图1

图2

图3

图4

图5

图6

表1

图7

表2

图8

表3

图9

表4

参考文献 0

	李　佳, 刘　芳, 李迪强, 等. 2017. 基于红外相机监测分析的红腹角雉日活动节律. 林业科学, 53(7): 170−176.
	Li J, Liu F, Li D Q, et al. 2017. Daily activity rhythm of Temminick’s Tragopan (Trgopan temminckii) based on infrared camera monitoring. Journal of Computer Applications. ［in Chinese］
	李　果, 李俊生, 关　潇, 等. 2014. 生物多样性监测技术手册. 北京: 中国环境科学出版社.
	Li G, Li J S, Guan X, et al. 2014. Biodiversity monitoring technical manuals. Beijing: China Environmental Science Press. ［in Chinese］
	Ali H, Vishwesh N, Yucheng T, et al. 2022. Swin unetr: swin transformers for semantic segmentation of brain tumors in mri images. International MICCAI Brainlesion Workshop, Springer, 272–284.
	Bochkovskiy A, Wang C Y, Liao H Y M. 2020. YOLOv4: optimal speed and accuracy of object detection. arXiv Preprint arXiv: 2004.10934.
	Chen R, Little R, Mihaylova L, et al. Forest wildlife surveillance using deep learning methods. Ecology and Evolution, 2019, 9 (17): 9453- 9466. doi: 10.1002/ece3.5410
	Chen G, Han T X, He Z, et al. 2014. Deep convolutional neural network based species recognition for wild animal monitoring. 2014 IEEE International Conference on Image Processing (ICIP), IEEE, 858−862.
	Carion N, Massa F, Synnaeve G, et al. 2020. End-to-end object detection with transformers. European Conference on Computer Vision (ECCV), 12346: 213–229.
	DeVries T, Taylor G W. 2017. Improved regularization of convolutional neural networks with cutout. arXiv Preprint arXiv: 1708.04552.
	Dosovitskiy A, Beyer L, Kolesnikov A, et al. 2020. An image is worth 16×16 words: transformers for image recognition at scale. arXiv Preprint arXiv: 2010.11929.
	Girshick R. 2015. Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision, 1440−1448.
	Girshick R, Donahue J, Darrell T, et al. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 580−587.
	He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37 (9): 1904- 1916. doi: 10.1109/TPAMI.2015.2389824
	He J, Erfani S, Ma X, et al. 2021. Alpha-IoU: a family of power intersection over union losses for bounding box regression. arXiv Preprint arXiv: 2110.13675.
	Han K, Wang Y, Chen H, et al. A survey on vision transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45 (1): 87- 110.
	Jannat F E, Willis A R. 2022. Improving classification of remotely sensed images with the Swin transformer. SoutheastCon 2022, IEEE, 611–618.
	Li Y, Mao H, Girshick R, et al. 2022. Exploring plain vision transformer backbones for object detection. Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022. Proceedings, Part IX. Cham: Springer Nature Switzerland, 280−296.
	Lin T Y, Goyal P, Girshick R, et al. 2017. Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, 2980−2988.
	Lin T Y, Maire M, Belongie S, et al. 2014. Microsoft coco: common objects in context. European Conference on Computer Vision, Springer, 740–755.
	Liu T, Ma Y, Yang W, et al. Spatial-temporal interaction learning based two-stream network for action recognition. Information Sciences, 2022, 606, 864- 876.
	Liu W, Anguelov D, Erhan D, et al. Ssd: single shot multibox detector. European Conference on Computer Vision. Springer, 2016, Cham, 21- 37.
	Liu Z, Tan Y, He Q, et al. Swinnet: Swin transformer drives edge-aware rgb-d and rgb-t salient object detection. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 32 (7): 4486- 4497.
	Liu Z, Lin Y, Cao Y, et al. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF International Conference on Computer Vision, 10012−10022.
	Naseer M M, Ranasinghe K, Khan S H, et al. Intriguing properties of vision transformers. Advances in Neural Information Processing Systems, 2021, 34, 23296- 23308.
	Khan S, Naseer M, Hayat M, et al. Transformers in vision: a survey. ACM Computing Surveys (CSUR), 2022, 54 (10s): 1- 41.
	Norouzzadeh M S, Nguyen A, Kosmala M, et al. Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proceedings of the National Academy of Sciences, 2018, 115 (25): E5716- E5725.
	Redmon J, Divvala S, Girshick R, et al. 2016. You only look once: unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779−788.
	Redmon J, Farhadi A. 2017. YOLO9000: better, faster, stronger. Proceedings of the IEEE Conference on Computer vision and Pattern Recognition. 7263−7271.
	Redmon J, Farhadi A. 2018. Yolov3: an incremental improvement. arXiv Preprint arXiv: 1804.02767.
	Ren S, He K, Girshick R, et al. 2015. Faster r-cnn: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 28.
	Sermanet P, Eigen D, Zhang X, et al. 2013. Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv Preprint arXiv: 1312.6229.
	Schneider T C, Kowalczyk R, Köhler M. 2013.Resting site selection by large herbivores―the case of European bison (Bison bonasus) in Biaowieza Primeval Forest. Mammalian Biology, 78(6): 438−445.
	Villa A G, Salazar A, Vargas F. Towards automatic wild animal monitoring: identification of animal species in camera-trap images using very deep convolutional neural networks. Ecological Informatics, 2017, 41, 24- 32. doi: 10.1016/j.ecoinf.2017.07.004
	Yun S, Han D, Oh S J, et al. 2019. Cutmix: Regularization strategy to train strong classifiers with localizable features. Proceedings of the IEEE/CVF International Conference on Computer Vision, 6023−6032.
	Zheng Z, Wang P, Liu W, et al. 2020. Distance-IoU loss: faster and better learning for bounding box regression. Proceedings of the AAAI Conference on Artificial Intelligence. 34(7): 12993−13000.

种类 Species	原图数量 Number of original images	融合图像数量 Number of image fusions	总数 Total
亚洲黑熊 Ursus thibetanus	525	525	10 072
豹猫 Prionailurus bengalensis	327	327
猕猴 Macaca mulatta	162	162
毛冠鹿 Elaphodus cephalophus	93	93
短尾猫 Lynx rufus	647	647
黑尾鹿 Odocoileus hemionus	1 065	1 065
浣熊 Procyon lotor	655	655
红松鼠 Tamiasciurus hudsonicus	804	804
赤狐 Vulpes vulpes	758	758

组别 Group	网络模型 Model	平均准确率 P	平均回归率 R	均值平均精度 mAP@0.5(%)	检测速度 Detection speed/FPS
1	YOLOv5s	0.83	0.66	74.1	45
2	YOLOv5s+CutMixE	0.81	0.68	76.7	45
3	YOLOv5s+CutMixE+ConcatE	0.85	0.74	80.4	45
4	YOLOv5s+CutMixE+SwinTR	0.83	0.75	84.5	36
5	YOLOv5s+CutMixE+ConcatE+SwinTR	0.85	0.82	86.8	36
6	YOLOv5s+CutMixE+ConcatE+SwinTR+α-DIoU	0.86	0.83	88.4	36

类别 Sort	YOLOv5s AP AP of YOLOv5s(%)	改进算法P P of improved algorithm	改进算法R R of improved algorithm	改进算法AP AP of improved algorithm (%)
亚洲黑熊Ursus thibetanus	83.0	0.95	0.96	93.6
豹猫Prionailurus bengalensis	69.4	0.88	0.84	85.9
猕猴Macaca mulatta	69.1	0.88	0.85	86.1
毛冠鹿Elaphodus cephalophus	68.8	0.72	0.82	80.4
短尾猫Lynx rufus	72.1	0.89	0.86	87.0
黑尾鹿Odocoileus hemionus	81.5	0.88	0.90	90.6
浣熊Procyon lotor	78.5	0.95	0.90	91.4
红松鼠Tamiasciurus hudsonicus	72.0	0.94	0.89	90.7
赤狐Vulpes vulpes	72.5	0.90	0.89	89.9

网络模型 Model	均值平均准确率 mAP@0.5(%)	检测速度 Detection speed/FPS	模型参数 Model size/ MB
YOLOv3	73.6	37	232.1
RetinaNet	72.8	36	46.3
Faster R-CNN	81.6	28	106.2
YOLOv5s	74.1	45	13.7
YOLOv5s-TR	86.3	23	15.1
改进算法 Improved algorithm	88.4	36	14.0

[1]	薛亚东,李迪强,李佳. 基于卫星追踪定位技术的库姆塔格沙漠野骆驼生境利用和迁移规律[J]. 林业科学, 2020, 56(10): 192-198.
[2]	莫锦华,李佳,刘芳,李晓光,李迪强. 利用红外相机调查海南尖峰岭地区兽类和鸟类多样性[J]. 林业科学, 2019, 55(10): 203-210.
[3]	孔维尧, 孙权, 刘鑫鑫, 曲丽, 王福友, 姚明远, 邹红菲. 基于红外相机监测的汪清自然保护区东北豹种群动态[J]. 林业科学, 2019, 55(5): 188-196.
[4]	黄河清, 初红军, 曹杰, 布兰, 胡德夫, 张东, 李凯. 干旱荒漠草原马胃蝇蛆病疫源地感染源分布——以卡拉麦里山有蹄类自然保护区为例[J]. 林业科学, 2017, 53(11): 142-149.
[5]	王文婷, 肖洒, 黄河清, 李凯, 张东, 初红军, 国有清, 高万里. 我国蒙新区、青藏区马胃蝇多样性及感染分析[J]. 林业科学, 2016, 52(2): 134-139.
[6]	王文婷, 张东, 胡德夫, 初红军, 曹杰, 葛炎, 艾尔肯·吉力力, 李凯. 新疆普氏野马马胃蝇蛆病主要病原体黑腹胃蝇溯源[J]. 林业科学, 2014, 50(11): 90-95.
[7]	孙飞翔;党坤良;陈俊娴. 秦岭大熊猫栖息地选择与森林群落[J]. , 2013, 49(5): 147-153.
[8]	齐磊;胡德夫;丁长青;隋金玲;张东;杨亮;吴记贵;蒋万杰. 北京松山国家级自然保护区鼠类群落多样性与结构变动分析[J]. 林业科学, 2012, 48(9): 181-185.
[9]	王艳英;王成;郄光发;董建华;蒋继宏. 侧柏枝叶挥发物对小白鼠自发行为的影响[J]. 林业科学, 2011, 47(12): 97-100.
[10]	铁军;张晶;彭林鹏;赵本元;张志翔;. 夏秋季节神农架川金丝猴取食主要影响因素分析[J]. 林业科学, 2011, 47(7): 108-115.
[11]	韩宗先;王维;胡锦矗. 重庆金佛山黑叶猴的春季生境选择[J]. 林业科学, 2011, 47(4): 121-128.