欢迎访问林业科学,今天是

林业科学 ›› 2024, Vol. 60 ›› Issue (3): 121-130.doi: 10.11707/j.1001-7488.LYKX20220597

• 研究论文 • 上一篇    下一篇

基于改进YOLOv5s的CNN-Swin Transformer森林野生动物图像目标检测算法

杨文翰(),刘天宇*,周俊池,胡文武,蒋蘋   

  1. 湖南农业大学机电工程学院 长沙 410128
  • 收稿日期:2022-09-03 出版日期:2024-03-25 发布日期:2024-04-08
  • 通讯作者: 刘天宇 E-mail:Yangwenhan@stu.hunau.edu.cn
  • 基金资助:
    中央国家重点野生动植物保护项目(湘财资环指[2021]47号)。

CNN-Swin Transformer Detection Algorithm of Forest Wildlife Images Based on Improved YOLOv5s

Wenhan Yang(),Tianyu Liu*,Junchi Zhou,Wenwu Hu,Ping Jiang   

  1. College of Mechanical and Electrical Engineering, Hunan Agricultural University Changsha 410128
  • Received:2022-09-03 Online:2024-03-25 Published:2024-04-08
  • Contact: Tianyu Liu E-mail:Yangwenhan@stu.hunau.edu.cn

摘要:

目的: 为提高野生动物在复杂森林环境中的检测精度,促进森林野生动物保护技术发展,提出一种基于YOLOv5s网络模型、针对陷阱相机所摄取森林野生动物图像的改进检测算法。方法: 以包含湖南壶瓶山国家级自然保护区几种典型森林野生动物在内的数据集为研究对象,首先,对真实标注框图像进行裁剪、归一化和缩放处理,随机将2~4张裁剪图像拼贴组成新的数据集元素,以丰富和增强数据集图像信息;其次,使用一种基于通道注意力思想的加权通道拼接方法,在通道拼接时引入权重改变通道数量,通过反向传播训练方法不断更新权重以增加重要特征信息的通道层数;接着,引入Swin Transformer模块与CNN网络相结合,为卷积神经网络特征提取加入自注意力机制,融合2种网络特征提取层的优势,提高特征提取的感受野;最后,选择更优的α-DIoU损失函数替代GIoU损失函数,针对边界框重叠面积和中心点距离造成的损失,引入新的几何因素惩罚项。结果: 在相同试验条件和数据集下,相比原YOLOv5s网络模型,改进算法极大提高检测的平均准确率和平均回归率,均值平均精度由74.1%提升至88.4%,获得14.3%的精度提升,同时也超过YOLOv3、YOLOXs、RetinaNet、Faster R-CNN等其他流行目标检测算法。结论: 针对陷阱相机所摄取森林野生动物图像背景与目标对比度低、遮挡重叠严重,致使检测误检率、漏检率高等问题,在检测算法中提出一系列改进措施,为我国森林野生动物的保护和数据获取提供一种新的可行性方案和思路。

关键词: 森林野生动物, 检测算法, YOLOv5s, Swin Transformer, 网络融合

Abstract:

Objective: To improve the detection accuracy of wildlife in complex forest environments and advance the development of forest wildlife conservation technology, an improved detection algorithm based on the YOLOv5s network model is proposed for forest wildlife images taken by trap cameras in this study. Method: A dataset containing several typical forest wildlife in the Huping Mountain National Nature Reserve in Hunan was used as the research object. Firstly, image enhancement was performed by cropping, normalizing and scaling the ground truth box images, and then two to four cropped images were randomly collaged to form new dataset elements to enrich and enhance the dataset image information. Secondly, a weighted channel stitching method based on the idea of channel attention was used, specifically, the number of channels was changed by introducing weights in channel stitching, and the weights was continuously updated to increase the number of channel layers with important feature information by a back-propagation training method. Then, the Swin Transformer module was introduced to and combined with the CNN network to add a self-attentive mechanism to the convolutional neural network feature extraction, which integrated the advantages of the feature extraction layers of both networks and improved the perceptual field of feature extraction. Finally, a better α-DIoU loss function was chosen to replace the GIoU loss function, and a new geometric factor penalty term was introduced to address the loss caused by the overlapping area of the bounding box and the distance of the centroid. Result: Under the same experimental conditions with the same data set, compared with the original YOLOv5s network model, the improved algorithm greatly improved the average accuracy and average regression rate of detection, increased the mean average precision (mAP) from 74.1% to 88.4%, obtained an accuracy improvement of 14.3%, and also outperformed other popular target detection algorithms such as YOLOv3, YOLOXs, RetinaNet and Faster R-CNN. Conclusion: The low contrast between background and target of forest wildlife images taken by trap cameras and serious overlap of occlusion result in high detection false detection rate and leakage rate. To address those problems, in this study a series of improvement measures have been proposed in the detection algorithm, which provides a new feasible solution and idea for the protection and data acquisition of forest wildlife in China.

Key words: forest wildlife, detection algorithm, YOLOv5s, Swin Transformer, network convergence

中图分类号: