欢迎访问林业科学,今天是

林业科学 ›› 2026, Vol. 62 ›› Issue (4): 194-205.doi: 10.11707/j.1001-7488.LYKX20250523

• 研究论文 • 上一篇    下一篇

基于视觉语言特征匹配的野生动物未知类别检测方法

杨紫合1,田野1,*(),王建涛2,裴志永3,孙晶4,张军国1,5,*()   

  1. 1. 北京林业大学工学院 林木资源高效生产全国重点试验室 林业装备与自动化国家林业和草原局重点实验室 北京 100083
    2. 内蒙古乌兰坝国家级自然保护区管理局 赤峰 025450
    3. 内蒙古农业大学能源与交通工程学院 呼和浩特 010018
    4. 兴安盟乌兰河地方级自然保护区管理局 乌兰浩特 137726
    5. 陕西省动物研究所 西安 710032
  • 收稿日期:2025-08-25 出版日期:2026-04-15 发布日期:2026-04-11
  • 通讯作者: 田野,张军国 E-mail:tytoemail@sina.com;zhangjunguo@bjfu.edu.cn
  • 基金资助:
    国家自然科学基金项目(32371874,32401569);北京市自然科学基金项目(6244053);陕西省科学院科技计划项目(2025K-32);陕西省科技计划项目(2025JC-YWGCZ-05,2025JC-GXPT-037)。

Detection Method of Unknown Wildlife Species Based on Vision-Language Feature Matching

Zihe Yang1,Ye Tian1,*(),Jiantao Wang2,Zhiyong Pei3,Jing Sun4,Junguo Zhang1,5,*()   

  1. 1. School of Technology, Beijing Forestry University State Key Laboratory of Efficient Production of Forest Resources Key Laboratory of National Forestry and Grassland Administration on Forestry Equipment and Automation Beijing 100083
    2. Administration Bureau of Inner Mongolia Wulanba National Nature Reserve Chifeng 025450
    3. College of Energy and Transportation Engineering, Inner Mongolia Agricultural University Hohhot 010018
    4. Xing’an League Wulanhe Local Nature Reserve Administration Ulanhot 137726
    5. Shaanxi Institute of Zoology Xi’an 710032
  • Received:2025-08-25 Online:2026-04-15 Published:2026-04-11
  • Contact: Ye Tian,Junguo Zhang E-mail:tytoemail@sina.com;zhangjunguo@bjfu.edu.cn

摘要:

目的: 针对开放环境下野生动物红外相机监测图像中未知类别检测识别率低的问题,提出一种不依赖显式环境描述或生境元数据仅依赖已知物种标签的未知类别检测方法,以适应真实监测数据中信息受限的普遍场景。方法: 提出基于视觉语言特征匹配的野生动物未知类别检测方法(EUA),通过耦合大语言模型(LLM)的生态推理能力与视觉语言模型的跨模态对齐特性,构建开放环境下的智能监测框架。首先,设计生态感知提示词,引导LLM仅基于已知物种集合推断区域生态背景,并生成具有生态合理性的潜在物种列表;其次,将潜在物种文本与已知类别共同构建扩展的视觉语言语义空间;最后,提出未知类别评分机制(ODS),通过计算图像在已知类别与潜在物种间的匹配分布偏离度,实现对未知类别的鲁棒检测。结果: 在Dataset3(D3)和North American Camera Trap Images(NACTI)2个公开数据集上的试验表明,EUA显著优于现有方法。在最具挑战性的5类未知类别场景下,EUA的平均假正例率(FPR95)为57.86%,比次优方法降低16.19%,受试者工作特征曲线下面积(AUC)达到84.31%,提升4.64个百分点。消融试验证实,基于生态推理的潜在物种生成和ODS评分机制是性能提升的核心。可视化分析进一步表明,EUA能有效分离已知与未知样本的分布,验证了其设计的有效性。结论: 本研究实现了从“被动分类”到“主动预见”的范式转变,为解决缺乏地理信息的真实监测场景下的未知类别检测问题提供了有效方案。EUA方法不仅在性能上取得突破,更探索出将生态学知识嵌入AI推理过程的可行路径,为构建具备生态感知能力的下一代野生动物智能监测系统提供了新思路。

关键词: 野生动物监测, 未知类别检测, 大语言模型, 视觉语言模型, 生态推理

Abstract:

Objective: In response of the problem of low recognition rate of unknown categories in infrared camera monitoring images of wildlife in open environments, a method for unknown category detection is proposed that does not rely on explicit environmental descriptions or habitat metadata, but only relies on known species labels. This method is designed for adapting to the common scenario of limited information in real monitoring dataset. Method: An envisioning unknown animal (EUA) method was proposed based on visual language feature matching, and the method integrated the ecological reasoning capability of large language model (LLM) with the cross-modal alignment of vision-language models to construct a monitoring framework for open environments. First, an ecologically-informed prompt was designed to guide the LLM to infer the regional ecological context solely from the set of known species sets and generate a list of potential species with ecological plausibility. Second, the text descriptions of these potential species were combined with known categories to construct an expanded vision-language semantic space. Finally, an outlier detection score (ODS) mechanism was introduced to robustly detect unknown categories by calculating the deviation in matching distribution of images between known categories and potential species. Result: Experiments on two public datasets, Dataset3 (D3) and North American Camera Trap Images (NACTI), demonstrated that EUA significantly outperformed existing methods. In the most challenging scenario with 5 unknown categories, the average false positive rate at 95% true positive rate (FPR@95TPR) of EUA was 57.86%, which was 16.19 percentage points lower than the suboptimal method. The area under the receiver operating characteristic curve (AUC) reached 84.31%, representing a 4.64 percentage point improvement. Ablation experiment confirmed that the ecologically-guided potential species generation and the scoring ODS mechanism were the core drivers of this performance gain. Visualization analysis further showed that EUA effectively separated the distributions of known and unknown samples, validating the effectiveness of the design. Conclusion: This study achieves a paradigm shift from “passive classification” to “proactive prediction”, providing an effective solution to the problem of unknown category detection in real-world monitoring scenarios lacking environmental priors. The EUA method not only achieves a breakthrough in performance, but also explores a feasible path for embedding ecological knowledge into AI reasoning processes, offering a new direction for building the next generation of ecologically-aware intelligent wildlife monitoring systems.

Key words: wildlife monitoring, unknown category detection, large language models, vision-language models, ecological reasoning

中图分类号: