基于视觉语言特征匹配的野生动物未知类别检测方法

doi:10.11707/j.1001-7488.LYKX20250523

林业科学 ›› 2026, Vol. 62 ›› Issue (4): 194-205.doi: 10.11707/j.1001-7488.LYKX20250523

基于视觉语言特征匹配的野生动物未知类别检测方法

杨紫合¹,田野^1,*(),王建涛²,裴志永³,孙晶⁴,张军国^1,^5,*()

1. 北京林业大学工学院　林木资源高效生产全国重点试验室　林业装备与自动化国家林业和草原局重点实验室　北京 100083
2. 内蒙古乌兰坝国家级自然保护区管理局　赤峰 025450
3. 内蒙古农业大学能源与交通工程学院　呼和浩特 010018
4. 兴安盟乌兰河地方级自然保护区管理局　乌兰浩特 137726
5. 陕西省动物研究所　西安 710032

收稿日期:2025-08-25 出版日期:2026-04-15 发布日期:2026-04-11
通讯作者: 田野,张军国 E-mail:tytoemail@sina.com;zhangjunguo@bjfu.edu.cn
基金资助:
国家自然科学基金项目（32371874，32401569）；北京市自然科学基金项目（6244053）；陕西省科学院科技计划项目（2025K-32）；陕西省科技计划项目（2025JC-YWGCZ-05，2025JC-GXPT-037）。

Detection Method of Unknown Wildlife Species Based on Vision-Language Feature Matching

Zihe Yang¹,Ye Tian^1,*(),Jiantao Wang²,Zhiyong Pei³,Jing Sun⁴,Junguo Zhang^1,^5,*()

1. School of Technology, Beijing Forestry University　State Key Laboratory of Efficient Production of Forest Resources　Key Laboratory of National Forestry and Grassland Administration on Forestry Equipment and Automation　Beijing 100083
2. Administration Bureau of Inner Mongolia Wulanba National Nature Reserve　Chifeng 025450
3. College of Energy and Transportation Engineering, Inner Mongolia Agricultural University　Hohhot 010018
4. Xing’an League Wulanhe Local Nature Reserve Administration　Ulanhot 137726
5. Shaanxi Institute of Zoology　Xi’an 710032

Received:2025-08-25 Online:2026-04-15 Published:2026-04-11
Contact: Ye Tian,Junguo Zhang E-mail:tytoemail@sina.com;zhangjunguo@bjfu.edu.cn

摘要/Abstract

摘要：

目的: 针对开放环境下野生动物红外相机监测图像中未知类别检测识别率低的问题，提出一种不依赖显式环境描述或生境元数据仅依赖已知物种标签的未知类别检测方法，以适应真实监测数据中信息受限的普遍场景。方法: 提出基于视觉语言特征匹配的野生动物未知类别检测方法（EUA），通过耦合大语言模型（LLM）的生态推理能力与视觉语言模型的跨模态对齐特性，构建开放环境下的智能监测框架。首先，设计生态感知提示词，引导LLM仅基于已知物种集合推断区域生态背景，并生成具有生态合理性的潜在物种列表；其次，将潜在物种文本与已知类别共同构建扩展的视觉语言语义空间；最后，提出未知类别评分机制（ODS），通过计算图像在已知类别与潜在物种间的匹配分布偏离度，实现对未知类别的鲁棒检测。结果: 在Dataset3（D3）和North American Camera Trap Images（NACTI）2个公开数据集上的试验表明，EUA显著优于现有方法。在最具挑战性的5类未知类别场景下，EUA的平均假正例率（FPR95）为57.86%，比次优方法降低16.19%，受试者工作特征曲线下面积（AUC）达到84.31%，提升4.64个百分点。消融试验证实，基于生态推理的潜在物种生成和ODS评分机制是性能提升的核心。可视化分析进一步表明，EUA能有效分离已知与未知样本的分布，验证了其设计的有效性。结论: 本研究实现了从“被动分类”到“主动预见”的范式转变，为解决缺乏地理信息的真实监测场景下的未知类别检测问题提供了有效方案。EUA方法不仅在性能上取得突破，更探索出将生态学知识嵌入AI推理过程的可行路径，为构建具备生态感知能力的下一代野生动物智能监测系统提供了新思路。

关键词: 野生动物监测, 未知类别检测, 大语言模型, 视觉语言模型, 生态推理

Abstract:

Objective: In response of the problem of low recognition rate of unknown categories in infrared camera monitoring images of wildlife in open environments, a method for unknown category detection is proposed that does not rely on explicit environmental descriptions or habitat metadata, but only relies on known species labels. This method is designed for adapting to the common scenario of limited information in real monitoring dataset. Method: An envisioning unknown animal (EUA) method was proposed based on visual language feature matching, and the method integrated the ecological reasoning capability of large language model (LLM) with the cross-modal alignment of vision-language models to construct a monitoring framework for open environments. First, an ecologically-informed prompt was designed to guide the LLM to infer the regional ecological context solely from the set of known species sets and generate a list of potential species with ecological plausibility. Second, the text descriptions of these potential species were combined with known categories to construct an expanded vision-language semantic space. Finally, an outlier detection score (ODS) mechanism was introduced to robustly detect unknown categories by calculating the deviation in matching distribution of images between known categories and potential species. Result: Experiments on two public datasets, Dataset3 (D3) and North American Camera Trap Images (NACTI), demonstrated that EUA significantly outperformed existing methods. In the most challenging scenario with 5 unknown categories, the average false positive rate at 95% true positive rate (FPR@95TPR) of EUA was 57.86%, which was 16.19 percentage points lower than the suboptimal method. The area under the receiver operating characteristic curve (AUC) reached 84.31%, representing a 4.64 percentage point improvement. Ablation experiment confirmed that the ecologically-guided potential species generation and the scoring ODS mechanism were the core drivers of this performance gain. Visualization analysis further showed that EUA effectively separated the distributions of known and unknown samples, validating the effectiveness of the design. Conclusion: This study achieves a paradigm shift from “passive classification” to “proactive prediction”, providing an effective solution to the problem of unknown category detection in real-world monitoring scenarios lacking environmental priors. The EUA method not only achieves a breakthrough in performance, but also explores a feasible path for embedding ecological knowledge into AI reasoning processes, offering a new direction for building the next generation of ecologically-aware intelligent wildlife monitoring systems.

Key words: wildlife monitoring, unknown category detection, large language models, vision-language models, ecological reasoning

中图分类号:

S718.6

杨紫合,田野,王建涛,裴志永,孙晶,张军国. 基于视觉语言特征匹配的野生动物未知类别检测方法[J]. 林业科学, 2026, 62(4): 194-205.

Zihe Yang,Ye Tian,Jiantao Wang,Zhiyong Pei,Jing Sun,Junguo Zhang. Detection Method of Unknown Wildlife Species Based on Vision-Language Feature Matching[J]. Scientia Silvae Sinicae, 2026, 62(4): 194-205.

图/表 10

表1

NACTI、D3数据集物种和数量介绍"

北美相机陷阱图像数据集物种 Species names in the North American Camera Trap Images dataset	拉丁学名 Latin name	数量 Number	第3数据集物种 Species names in dataset 3	拉丁学名 Latin name	数量 Number
美洲黑熊 American black bear	Ursus americanus	2 965	狮子 Lion	Panthera leo	3 708
美洲狮 Cougar	Puma concolor	2 907	猎豹 Cheetah	Acinonyx jubatus	2 646
短尾猫 Bobcat	Lynx rufus	2 510	狷羚 Hartebeest	Alcelaphus buselaphus	2 490
骡鹿 Mule deer	Odocoileus hemionus	2 158	非洲水牛 Buffalo	Syncerus caffer	2 425
马鹿 Elk	Cervus canadensis	2 156	大象 Elephant	Loxodonta africana	2 184
赤鹿 Red deer	Cervus elaphus	2 128	长颈鹿 Giraffe	Giraffa camelopardalis	2 009
野猪 Wild boar	Sus scrofa	1 893	狒狒 Baboon	Papio anubis	1 831
郊狼 Coyote	Canis latrans	1 814	大羚羊 Eland	Tragelaphus oryx	1 792
灰狐 Gray fox	Urocyon cinereoargenteus	1 539	苇羚 Reedbuck	Redunca redunca	1 634
雪鞋兔 Snowshoe hare	Lepus americanus	1 446	角马 Wildebeest	Connochaetes taurinus	1 616
浣熊 Raccoon	Procyon lotor	1 376	斑点鬣狗 Spotted hyena	Crocuta crocuta	1 552
条纹臭鼬 Striped skunk	Mephitis mephitis	1 323	葛氏瞪羚 Grant’s gazelle	Nanger granti	1 515
驼鹿 Moose	Alces alces	1 194	珍珠鸡 Guinea fowl	Numida meleagris	1 515
东部灰松鼠 Eastern gray squirrel	Sciurus carolinensis	998	黑斑羚 Impala	Aepyceros melampus	1 500
野生火鸡 Wild turkey	Meleagris gallopavo	815	犬羚 DikDik	Madoqua kirkii	1 470
黑尾野兔 Black tailed jackrabbit	Lepus californicus	792	汤氏瞪羚 Thomson’s gazelle	Eudorcas thomsonii	1 467
九带犰狳 Nine banded armadillo	Dasypus novemcinctus	634	转角牛羚 Topi	Damaliscus lunatus	1 466
美洲红松鼠 American red squirrel	Tamiasciurus hudsonicus	388	疣猪 Warthog	Phacochoerus africanus	1 464
加州鹌鹑 California quail	Callipepla californica	337	斑马 Zebra	Equus quagga	1 462
赤狐 Red fox	Vulpes vulpes	327	河马 Hippopotamus	Hippopotamus amphibius	1 400
弗吉尼亚负鼠 Virginia opossum	Didelphis virginiana	110	灰颈鹭鸨 Kori bustard	Ardeotis kori	1 180
美洲貂 American marten	Martes americana	88	胡狼 Jackal	Lupulella mesomelas	862
			秘书鸟 Secretary bird	Sagittarius serpentarius	639
			鸵鸟 Ostrich	Struthio camelus	533

表1

图1

图2

表2

表3

表4

图3

表5

表6

图4

参考文献 0

	冯　彬, 胡　露, 赵姗姗, 等. 同域分布中华鬣羚与中华斑羚时空生态位特征. 生态学报, 2022, 42 (13): 5275- 5284. doi: 10.5846/stxb202108022099
	Feng B, Hu L, Zhao S S, et al. Spatio-temporal niche characteristics of sympatric Chinese serow and Chinese goral. Acta Ecologica Sinica, 2022, 42 (13): 5275- 5284. doi: 10.5846/stxb202108022099
	高　菲, 杨　柳, 李　晖, 2022. 开放集识别研究综述. 南京大学学报(自然科学), 58(1): 115−134.
	Gao F, Yang L, Li H, 2022. A survey on open set recognition. Journal of Nanjing University(Natural Science), 58(1): 115−134. ［in Chinese］
	李治霖, 多立安, 李　晟, 等. 陆生食肉动物竞争与共存研究概述. 生物多样性, 2021, 29 (1): 81- 97.
	Li Z L, Duo L A, Li S, et al. Competition and coexistence among terrestrial mammalian carnivores. Biodiversity Science, 2021, 29 (1): 81- 97.
	田燕菲, 刘冬志, 初红军, 等. 2021. 卡拉麦里山有蹄类野生动物自然保护区金矿生态恢复区的土壤理化性质和植被群落特征. 水土保持通报, 41(5): 107−114.
	Tian Y F, Liu D Z, Chu H J, et al. 2021. Soil physicochemical properties and vegetation community characteristics in a gold mine at Ungulates Wildlife Nature Reserve in Kalamaili Mountain. Bulletin of Soil and Water Conservation, 41(05): 107−114. ［in Chinese］
	肖治术, 肖文宏, 王天明, 等. 2022. 中国野生动物红外相机监测与研究: 现状及未来. 生物多样性, 30(10): 234−259.
	Xiao Z S, Xiao W H, Wang T M, et al. 2022. Wildlife monitoring and research using camera-trapping technology across China: The current status and future issues. Biodiversity Science, 30(10): 234−259. ［in Chinese］
	薛亚东, 孙　戈, 李　佳, 等. 2024. 蒙古国大戈壁保护区A区戈壁棕熊及同域分布动物多样性和分布格局. 林业科学, 60(7): 95−104.
	Xue Y D, Sun G, Li J, et al. 2024. Diversities and distribution patterns of Gobi bear and its sympatric species in Great Gobi A Strictly Protected Area in Mongolia. Scientia Silvae Sinicae, 60(7): 95−104. ［in Chinese］
	文　森, 钱　力, 胡懋地, 等. 2024. 基于大语言模型的问答技术研究进展综述. 数据分析与知识发现, 8(6): 16−29.
	Wen S, Qian L, Hu M D, et al., 2024. Review of research progress on question-answering techniques based on large language models. Data Analysis and Knowledge Discovery, 8(6): 16−29. ［in Chinese］
	赵恩庭, 张长春, 赵海涛, 等. 2025. 基于对抗学习的野生动物图像域适应识别方法. 林业科学, 61(4): 1−8.
	Zhao E T, Zhang C C, Zhao H T, et al. 2025. A recognition method of domain adaptation for wildlife images based on adversarial learning. Scientia Silvae Sinicae, 61(4): 1−8. ［in Chinese］
	Cao C T, Zhong Z, Zhou Z K, et al. 2024. Envisioning outlier exposure by large language models for out-of-distribution detection. Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, 235: 5629−5659.
	Delisle Z J, Henrich M, Palencia P, et al. Reducing bias in density estimates for unmarked populations that exhibit reactive behaviour towards camera traps. Methods in Ecology and Evolution, 2023, 14 (12): 3100- 3111. doi: 10.1111/2041-210X.14247
	Dosovitskiy A, Beyer L, Kolesnikov A, et al. 2021. An image is worth 16x16 words: transformers for image recognition at scale. International Conference on Learning Representations (ICLR), Vienna, Austria, 1−22.
	Gupta A, Narayan S, Joseph K J, et al. 2022. OW-DETR: Open-world detection transformer. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 9235−9244.
	Guo D Y, Yang D J, Zhang H W, et al. 2025. DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. arXiv Preprint arXiv: 2501.12948.
	He K M, Zhang X Y, Ren S Q, et al. 2016. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 770−778.
	Hendrycks D, Basart S, Mazeika M, et al. 2019. Scaling out-of-distribution detection for real-world settings. International Conference on Machine Learning (ICML), Long Beach, USA, 1−15.
	Joseph K J, Khan S, Khan F S, et al. 2021. Towards open world object detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vienna, Austria, 5830−5840.
	Li X H, Tian H D, Piao Z J, et al., 2022. Cameratrapr: an r package for estimating animal density using camera trapping data. Ecological informatics, 69: 101597.
	Liu W T, Wang X Y, Marco T P C, et al. 2020. Energy-based out-of-distribution detection. Neural Information Processing Systems (NeurIPS), 1−16.
	Maure L A, Diniz M F, Marco T P C, et al. 2023. Biodiversity and carbon conservation under the ecosystem stability of tropical forests. Journal of Environmental Management, 345: 118929.
	Ming Y F, Cai Z Y, Gu J X, et al. 2022. Delving into out-of-distribution detection with vision-language representations. Neural Information Processing Systems (NeurIPS), New Orleans, USA, 1−22.
	Nazir S, Kaleem M, 2021. Advances in image acquisition and processing technologies transforming animal ecological studies. Ecological Informatics, 61, 101212.
	Radford A, Kim J W, Hallacy C, et al. 2021. Learning transferable visual models from natural language supervision. International Conference on Machine Learning, 1−16.
	Saadati M, Balu A, Chiranjeevi S, et al. Out-of-distribution detection algorithms for robust insect classification. Plant Phenomics, 2024, 6, 170. doi: 10.34133/plantphenomics.0170
	Stein A, Gerstner K, Kreft H. Environmental heterogeneity as a universal driver of species richness across taxa, biomes and spatial scales. Ecology Letters, 2014, 17 (7): 866- 880. doi: 10.1111/ele.12277
	Swanson A, Kosmala M, Lintott C, et al. Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna. Scientific Data, 2015, 2, 150026. doi: 10.1038/sdata.2015.26
	Tabak M A, Norouzzadeh M S, Wolfson D W, et al. Machine learning to classify animal species in camera trap images: applications in ecology. Methods in Ecology and Evolution, 2019, 10 (4): 585- 590. doi: 10.1111/2041-210X.13120
	Thau D, Ahumada J A, Birch T, et al. Artificial intelligence’s role in global camera trap data management and analytics via wildlife insights. Biodiversity Information Science and Standards, 2019, 3, e38233. doi: 10.3897/biss.3.38233
	Wang H T, Li Y, Yao H F, et al. 2023. CLIPN for zero-shot OOD detection: teaching CLIP to say no. International Conference on Computer Vision, Paris, France, 1−11.

未知类别数量 Number of unknown categories	模型 Model	第3数据集 D3 dataset		NACTI数据集NACTI dataset		平均值Average value
未知类别数量 Number of unknown categories	模型 Model	FPR95↓	AUC↑	FPR95↓	AUC↑	FPR95↓	AUC↑
5	CLIPN	98.00	50.34	96.48	62.98	97.24	56.66
	MCM	83.50	69.15	79.60	79.06	81.55	74.11
	Energy	82.92	65.36	96.80	30.77	89.86	48.07
	Max-logit	71.17	75.31	83.20	67.50	77.19	71.41
	EOE	70.25	80.80	77.76	76.50	74.05	79.67
	EUA	43.17	88.77	72.55	79.85	57.86	84.31
3	CLIPN	99.30	40.01	96.38	59.62	97.84	49.82
	MCM	91.81	51.33	86.67	69.91	89.24	60.62
	Energy	90.69	57.67	98.29	30.90	94.49	44.29
	Max-logit	88.75	56.81	90.10	58.21	89.43	57.51
	EOE	88.33	59.78	88.76	60.66	88.55	60.22
	EUA	72.22	69.36	79.62	77.41	75.92	73.39
1	CLIPN	100.00	31.83	96.59	68.04	98.30	49.94
	MCM	80.83	62.78	86.36	80.35	83.59	71.57
	Energy	97.50	48.83	100.00	22.32	98.75	35.58
	Max-logit	87.92	60.40	90.91	63.53	89.42	61.97
	EOE	84.58	64.23	94.32	75.41	89.45	69.82
	EUA	57.92	79.06	64.77	82.70	61.35	80.88

模型 Model	第3数据集 D3 dataset		NACTI数据集NACTI dataset		平均值Average value
模型 Model	FPR95↓	AUC↑	FPR95↓	AUC↑	FPR95↓	AUC↑
基线 Baseline	82.61	64.17	88.07	63.08	85.34	63.63
CSS	85.38	61.09	84.21	76.44	84.80	68.76
CSS, GPL	70.55	74.54	76.41	80.22	73.48	77.38
CSS, GPL, ODS	57.77	79.06	72.31	79.99	65.04	79.53

大语言模型 Large language model	D3数据集 D3 dataset			NACTI数据集NACTI dataset			平均值Average value
大语言模型 Large language model	精确率 Precision	召回率 Recall	F1分数 F1-score	精确率 Precision	召回率 Recall	F1分数 F1-score	精确率 Precision	召回率 Recall	F1分数 F1-score
GPT-4o	25.00	25.00	25.00	45.45	45.45	45.45	35.23	35.23	35.23
Llama-3.3-70B	20.00	50.00	28.57	12.77	54.55	20.69	16.39	52.28	24.63
Claude3.7	26.32	41.67	32.26	20.00	27.27	23.08	23.16	34.47	27.67
文心-X1-Turbo	50.00	41.67	45.45	41.67	45.45	43.48	45.84	43.56	44.47
Qwen3	11.11	16.67	13.33	15.00	27.27	19.35	13.06	21.97	16.34
DeepSeek-R1	60.00	50.00	54.55	60.00	54.55	57.14	60.00	52.28	55.85

大语言模型提示 Large language model prompt	第3数据集 D3 dataset		NACTI数据集NACTI dataset		平均值Average value
大语言模型提示 Large language model prompt	FPR95↓	AUC↑	FPR95↓	AUC↑	FPR95↓	AUC↑
P_baseline	73.69	67.41	86.31	77.93	80.00	72.67
P_EOE	68.36	73.67	81.16	70.92	74.76	72.30
P_camera	64.95	74.20	75.61	78.64	70.28	76.42
P_EUA	57.77	79.06	72.31	79.99	65.04	79.53

检测分数Detection score	D3数据集 D3 dataset		NACTI数据集NACTI dataset		平均值Average value
检测分数Detection score	FPR95↓	AUC↑	FPR95↓	AUC↑	FPR95↓	AUC↑
S_MSP	88.16	63.54	79.95	78.51	84.06	71.03
S_MAX	100.00	68.00	100.00	70.60	100.00	69.30
S_energy	87.85	68.26	94.04	67.38	90.945	67.82
S_EOE	70.55	74.54	76.41	80.22	73.48	77.38
S_ODS	57.77	79.06	72.31	79.99	65.04	79.53

基于视觉语言特征匹配的野生动物未知类别检测方法

Detection Method of Unknown Wildlife Species Based on Vision-Language Feature Matching

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 0

相关文章 1

编辑推荐

Metrics

本文评价

作者中心

期刊获奖

引证指标

联系方式