基于RT-DETR的林间松果检测方法

doi:10.11707/j.1001-7488.LYKX20240518

摘要/Abstract

摘要：

目的: 针对林间环境复杂、小目标松果纹理特征不明显，导致检测精度不足和检测实时性差的问题，提出一种基于Real-time detection transformer（RT-DETR）的林间松果检测方法，并针对RT-DETR模型进行优化，提升其检测性能。方法: 首先，为了提升检测精度，将原主干网络替换为Re-parameterized vision transformer（RepViT），以增强特征提取能力。其次，引入High-low frequency feature interactions（HiLo）高低频分离机制，提高细节纹理的捕捉能力。最后，将Re-parameterized cross stage partial bottleneck with 3 convolutions（RepC3）模块优化为Decoupled replicated bottleneck cross stage partial with 3 convolutions（DRBC3），通过融合大核卷积与扩张卷积，显著扩大其感受野。与此同时，RepViT和DRBC3均采用结构重参数设计，使得推理时模型结构得以简化，从而提升检测效率。结果: 经过优化的RT-DETR模型，针对中国黑龙江省佳木斯大来林场收集的松果图像数据集的测试结果表明，模型的各项指标均达到最佳平衡，其中AP50达到93.37%，精度和召回率分别为93.30%和92.65%。在AP50提升5%的同时，GFLOPs降低了51%，参数量减少了41%，实时帧率FPS从74.3显著提升至95.5，提升幅度达到28%。结论: 这一优化方法显著提高林间松果检测的精度、实时性和效率，为实际应用中的自动化松果采集任务提供了有效的解决方案。

关键词: RT-DETR, 松果检测, RepViT, HiLo高低频分离机制, DRBC3

Abstract:

Objective: In this study, a forest pinecone detection method based on real-time detection transformer (RT-DETR) was proposed to address the challenges of complex forest environments, small pinecones with indistinct texture features, leading to insufficient detection accuracy and poor real-time detection performance. The RT-DETR model has been optimized to enhance detection performance. Method: Firstly, to improve detection accuracy, the original backbone network was replaced with the re-parameterized vision transformer (RepViT) to enhance feature extraction capability. Secondly, the high-low frequency feature interactions (HiLo) mechanism was introduced to improve the capture of fine texture details. Finally, the re-parameterized cross stage partial bottleneck with 3 convolutions (RepC3) module was optimized into the decoupled replicated bottleneck cross stage partial with 3 convolutions (DRBC3). The receptive field was significantly expanded by incorporating large kernel convolutions and dilated convolutions. Meanwhile, both RepViT and DRBC3 adopted structural re-parameterization designs, simplifying the model structure during inference, and thus improving detection efficiency. Result: The optimized RT-DETR model was tested on the pinecone image dataset collected from Dalai forest station in Jiamusi, Heilongjiang Province, China, and the result showed that all metrics of the model achieved optimal balance, with AP50 of 93.37%, a precision of 93.30%, and a recall of 92.65%. While AP50 improved by 5%, GFLOPs were reduced by 51%, the number of parameters decreased by 41%, and the real-time frame rate FPS significantly increased from 74.3 to 95.5, representing a 28% improvement. Conclusion: This optimization method significantly improves the accuracy, real-time performance, and efficiency of pinecone detection in forest environments, providing an effective solution for automated pinecone harvesting tasks in practical applications.

Key words: RT-DETR, pinecone detection, RepViT, HiLo high-low frequency separation mechanism, DRBC3

中图分类号:

S718.5
TP181

吴晨旭,张冬妍,张榄翔,陈诺,毛思雨. 基于RT-DETR的林间松果检测方法[J]. 林业科学, 2025, 61(6): 25-37.

Chenxu Wu,Dongyan Zhang,Lanxiang Zhang,Nuo Chen,Siyu Mao. Detection Method of Pinecones in the Forest Based on RT-DETR[J]. Scientia Silvae Sinicae, 2025, 61(6): 25-37.

图/表 11

图1

图2

图3

图4

图5

图6

图7

表1

表2

表3

图8

参考文献 0

	崔　颖, 韩佳成, 高　山, 等. 基于改进Deformable-DETR的水下图像目标检测方法. 应用科技, 2024, 51 (1): 30- 36，91. doi: 10.11991/yykj.202302003
	Cui Y, Han J C, Gao Shan, et al. Underwater object detection method based on improved Deformable-DETR. Applied Science and Technology, 2024, 51 (1): 30- 36，91. doi: 10.11991/yykj.202302003
	程嘉瑜, 陈妙金, 李　彤, 等. 基于改进Faster-RCNN网络的无人机遥感影像桃树检测. 浙江农业学报, 2024, 36 (8): 1909- 1919. doi: 10.3969/j.issn.1004-1524.20230912
	Cheng J Y, Chen M J, Li T, et al. Prunus persica detection in UAV remote sensing images based on improved Faster-RCNN network. Acta Agriculturae Zhejiangensis, 2024, 36 (8): 1909- 1919. doi: 10.3969/j.issn.1004-1524.20230912
	胡佳乐, 周　敏, 申　飞. 面向无人机小目标的RTDETR改进检测算法. 计算机工程与应用, 2024, 60 (20): 198- 206. doi: 10.3778/j.issn.1002-8331.2404-0114
	Hu J L, Zhou M, Shen F. Improved RT-DETR detection algorithm for small UAV targets. Computer Engineering and Applications, 2024, 60 (20): 198- 206. doi: 10.3778/j.issn.1002-8331.2404-0114
	黄启灏, 靳国旺, 熊　新, 等. 通道剪枝与知识蒸馏相结合的轻量化SAR目标检测. 测绘学报, 2024, 53 (4): 712- 723.
	Huang Q H, Jin G W, Xiong X, et al. Lightweight SAR object detection combining channel pruning and knowledge distillation. Acta Geodaetica et Cartographica Sinica, 2024, 53 (4): 712- 723.
	李　淼, 王敬贤, 李华龙, 等. 基于 CNN 和迁移学习的农作物病害识别方法研究. 智慧农业, 2019, 1 (3): 46. doi: 10.12133/j.smartag.2019.1.3.201903-SA005
	Li M, Wang J X, Li H L, et al. Research on crop disease identification method based on CNN and transfer learning. Smart Agriculture, 2019, 1 (3): 46. doi: 10.12133/j.smartag.2019.1.3.201903-SA005
	李　翔, 何　淼, 罗海波. 面向遮挡行人检测的自适应收缩非极大值抑制方法. 控制与决策, 2024, 39 (7): 2177- 2185.
	Li X, He M, Luo H B. Adaptive contraction non-maximum suppression method for occluded pedestrian detection. Control and Decision, 2024, 39 (7): 2177- 2185.
	刘　龙, 方榉炫, 张梦璇, 等. 基于Transformer特征关联融合小目标检测算法研究. 信号处理, 2024, 15 (3): 1- 26.
	Liu L, Fang J X, Zhang M X, et al. Research on small object detection algorithm based on Transformer feature association and fusion. Journal of Signal Processing, 2024, 15 (3): 1- 26.
	罗志聪, 何陈涛, 陈登捷, 等. 基于轻量化YOLO v8s-GD的自然环境下百香果快速检测模型. 农业机械学报, 2024, 55 (8): 291- 300. doi: 10.6041/j.issn.1000-1298.2024.08.026
	Luo Z C, He C T, Chen D J, et al. Fast detection model for Passion fruit in natural environments based on lightweight YOLO v8s-GD. Transactions of the Chinese Society of Agricultural Machinery, 2024, 55 (8): 291- 300. doi: 10.6041/j.issn.1000-1298.2024.08.026
	吕永俊, 王士贤, 彭　芳, 等. 松果有效成分研究——VI. 红松与油松松塔及松子壳的抗癌活性. 大理学院学报, 2008, (2): 1- 2.
	Lü Y J, Wang S X, Peng F, et al. Study on the active components of pine cones — VI. anti-cancer activity of pine cones and pine nut shells from Pinus koraiensis and Pinus tabuliformis. Journal of Dali University, 2008, (2): 1- 2.
	齐建东, 郑尚姿, 陈子仪, 等. 基于ConvNeXt的北京地区红外相机野生动物图像识别改进模型构建. 林业科学, 2024, 60 (8): 33- 45.
	Qi J D, Zheng S Z, Chen Z Y, et al. Construction of an improved model for wildlife image recognition in Beijing region based on ConvNeXt. Scientia Silvae Sinicae, 2024, 60 (8): 33- 45.
	王文杰, 陈　伟, 路锦通, 等. 基于RT-DETR-Faster的苹果采摘机器人实时目标检测算法. 自动化与仪表, 2024, 39 (7): 57- 62.
	Wang W J, Chen W, Lu J T, et al. Real-time object detection algorithm for Malus domestica picking robots based on RT-DETR-Faster. Automation & Instrumentation, 2024, 39 (7): 57- 62.
	王雨博. 一种开环型松塔采摘器设计. 中国科技信息, 2022, (22): 73- 75.
	Wang Y B. Design of an open-loop pine cone picker. China Science and Technology Information, 2022, (22): 73- 75.
	王智航, 张永红, 于婉婷, 等. 红松松塔、松子壳研究进展及在畜牧业中应用可行性分析. 国外畜牧学(猪与禽), 2009, 29 (4): 88- 90.
	Wang Z H, Zhang Y H, Yu W T, et al. Research progress on pine cones and pine nut shells of Pinus koraiensis and feasibility analysis of their application in animal husbandry. Animal Science Abroad(Pigs and Poultry), 2009, 29 (4): 88- 90.
	邢远秀, 刘颛玮, 邢玉峰, 等. 2024. BDD-DETR: 高效感知小目标的锂电池表面缺陷检测. 储能科学与技术, 1−10.
	Xing Y X, Liu Z W, Xing Y F, et al. 2024. BDD-DETR: efficient detection of lithium battery surface defects for small object perception. Energy Storage Science and Technology, 1−10. ［in Chinese］
	徐丹青, 吴一全. 光学遥感图像目标检测的深度学习算法研究进展. 遥感学报, 2024, 46 (19): 1- 30. doi: 10.11834/jrs.20243166
	Xu D Q, Wu Y Q. Research progress on deep learning algorithms for object detection in optical remote sensing images. National Remote Sensing Bulletin, 2024, 46 (19): 1- 30. doi: 10.11834/jrs.20243166
	杨文翰, 刘天宇, 周俊池, 等. 基于改进YOLOv5s的CNN-Swin Transformer森林野生动物图像目标检测算法. 林业科学, 2024, 60 (3): 121- 130. doi: 10.11707/j.1001-7488.LYKX20220597
	Yang W H, Liu T Y, Zhou J C, et al. An improved YOLOv5s-Based CNN-Swin transformer algorithm for object detection of forest wildlife images. Scientia Silvae Sinicae, 2024, 60 (3): 121- 130. doi: 10.11707/j.1001-7488.LYKX20220597
	张国立, 常　帅, 宋延嵩, 等. 基于可变形卷积和多尺度残差注意力的多光谱行人检测. 激光与光电子学进展, 2024, 61 (10): 359- 366.
	Zhang G L, Chang S, Song Y S, et al. Multi-spectral pedestrian detection based on deformable convolution and multi-scale residual attention. Laser & Optoelectronics Progress, 2024, 61 (10): 359- 366.
	张惠莉, 代晨龙, 任景龙, 等. 2024. 基于GhostNetV2改进YOLO v8模型的葡萄病害识别方法研究. 农业机械学报, 1−11.
	Zhang H L, Dai C L, Ren J L, et al. 2024. Research on Vitis vinifera disease identification method based on improved YOLO v8 model with GhostNetV2. Transactions of the Chinese Society for Agricultural Machinery, 1−11. ［in Chinese］
	Carion N, Massa F, Synnaeve G, et al. 2020. End-to-end object detection with transformers. European conference on computer vision. Cham: Springer, 213−229.
	Dosovitskiy A. 2020. An image is worth 16x16 words: transformers for image recognition at scale. arXiv: 2010.11929
	Girshick R, Donahue J, Darrell T, et al. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), 580−587.
	Girshick R. 2015. Fast r-cnn. Proceedings of the IEEE international conference on computer vision(ICCV), 1440−1448.
	He K, Zhang X, Ren S, et al. 2016. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), 770−778.
	Koonce B. 2021. Convolutional neural networks with Swift for TensorFlow: Image recognition and dataset categorization. Berkeley: Apress, 125–144.
	Liu W, Anguelov D, Erhan D, et al. 2016. Ssd: Single shot multibox detector//European conference on computer vision. Cham: Springer, 21−37.
	Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39 (6): 1137- 1149.
	Redmon J, Divvala S, Girshick R, et al. 2016. You only look once: unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 779−788.
	Selvaraju R R, Cogswell M, Das A, et al. 2017. Grad-cam: visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE International Conference on Computer Vision(ICCV), 618−626.
	Tan M, Pang R, Le Q V. 2020. Efficientdet: scalable and efficient object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 10781−10790.
	Vaswani A. 2017. Attention is all you need. Advances in Neural Information Processing Systems. arXiv: 1706.03762
	Wang P, Chen P, Yuan Y, et al. 2018. Understanding convolution for semantic segmentation. IEEE Winter Conference on Applications of Computer Vision(WACV), 1451−1460.
	Zhao Y, Lü W, Xu S, et al. 2024. Detrs beat YOLOs on real-time object detection. Conference on Computer Vision and Pattern Recognition(CVPR), 16965−16974.

RTDETR	REPVIT	HiLo	DRBC3	Precision(%）	Recall(%)	AP50(%)	GFLOPs	Params(MB)	FPS
√				88.12	87.56	89.41	56.9	40.4	74.3
√	√			90.95	92.11	91.85	36.3	27.6	77.5
√		√		91.03	90.78	91.42	57.1	40.4	79.4
√			√	90.45	91.05	91.20	48.3	36.7	77.7
√	√	√		92.12	91.80	91.95	36.5	27.6	83.8
√	√		√	92.50	91.85	92.31	28.5	23.9	85.7
√	√	√	√	93.30	92.65	93.37	27.8	23.8	95.5

原图Original image	RTDETR	+REPVIT	+HiLo	+DRBC3	OURS

模型Models	Input size	Precision(%）	Recall(%）	AP50(%）	Params/MB	GFLOPs
SSD	300*300	90.28	60.53	65.28	26.28	62.75
Faster-RCNN	800*800	32.47	54.35	84.32	41.34	177.59
YOLOv4	416*416	93.51	62.23	77.19	63.94	59.95
YOLOv5s	640*640	90.48	75.18	81.31	46.63	114.56
YOLOv6s	640*640	86.33	68.12	76.72	32.81	44.00
YOLOv7tiny	640*640	90.70	64.41	74.90	6.23	13.86
YOLOv8n	640*640	84.72	69.41	81.11	6.22	8.10
YOLOv8s	640*640	91.47	74.65	85.92	22.51	28.40
RT-DETR-R18	640*640	88.12	87.56	89.41	56.92	40.40
Ours	640*640	93.30	92.65	93.37	23.80	27.80