Welcome to visit Scientia Silvae Sinicae,Today is

Scientia Silvae Sinicae ›› 2024, Vol. 60 ›› Issue (3): 121-130.doi: 10.11707/j.1001-7488.LYKX20220597

• Research papers • Previous Articles     Next Articles

CNN-Swin Transformer Detection Algorithm of Forest Wildlife Images Based on Improved YOLOv5s

Wenhan Yang(),Tianyu Liu*,Junchi Zhou,Wenwu Hu,Ping Jiang   

  1. College of Mechanical and Electrical Engineering, Hunan Agricultural University Changsha 410128
  • Received:2022-09-03 Online:2024-03-25 Published:2024-04-08
  • Contact: Tianyu Liu E-mail:Yangwenhan@stu.hunau.edu.cn

Abstract:

Objective: To improve the detection accuracy of wildlife in complex forest environments and advance the development of forest wildlife conservation technology, an improved detection algorithm based on the YOLOv5s network model is proposed for forest wildlife images taken by trap cameras in this study. Method: A dataset containing several typical forest wildlife in the Huping Mountain National Nature Reserve in Hunan was used as the research object. Firstly, image enhancement was performed by cropping, normalizing and scaling the ground truth box images, and then two to four cropped images were randomly collaged to form new dataset elements to enrich and enhance the dataset image information. Secondly, a weighted channel stitching method based on the idea of channel attention was used, specifically, the number of channels was changed by introducing weights in channel stitching, and the weights was continuously updated to increase the number of channel layers with important feature information by a back-propagation training method. Then, the Swin Transformer module was introduced to and combined with the CNN network to add a self-attentive mechanism to the convolutional neural network feature extraction, which integrated the advantages of the feature extraction layers of both networks and improved the perceptual field of feature extraction. Finally, a better α-DIoU loss function was chosen to replace the GIoU loss function, and a new geometric factor penalty term was introduced to address the loss caused by the overlapping area of the bounding box and the distance of the centroid. Result: Under the same experimental conditions with the same data set, compared with the original YOLOv5s network model, the improved algorithm greatly improved the average accuracy and average regression rate of detection, increased the mean average precision (mAP) from 74.1% to 88.4%, obtained an accuracy improvement of 14.3%, and also outperformed other popular target detection algorithms such as YOLOv3, YOLOXs, RetinaNet and Faster R-CNN. Conclusion: The low contrast between background and target of forest wildlife images taken by trap cameras and serious overlap of occlusion result in high detection false detection rate and leakage rate. To address those problems, in this study a series of improvement measures have been proposed in the detection algorithm, which provides a new feasible solution and idea for the protection and data acquisition of forest wildlife in China.

Key words: forest wildlife, detection algorithm, YOLOv5s, Swin Transformer, network convergence

CLC Number: