基于系统分类学信息的鸟类音频零样本分类

doi:10.11707/j.1001-7488.LYKX20240436

Abstract

Abstract:

Objective: The bird audio pretraining model, constructed through a large number of audio-text pairs, can be used for zero-shot classification of audio with insufficient training samples by utilizing side information for species classification. This approach can reduce the burden of data collection and provide an effective theoretical basis for zero-shot classification of bird audio, aiding ecological monitoring and analysis of species distribution changes in open environments. Method: The taxonomic information reflecting the phylogenetic relationships of birds was used as side information for species class. The pretrained RoBERTa text encoder and acoustic embeddings of audio using the pretrained HTSAT audio encoder were used to extract semantic embeddings of the taxonomy, respectively. The contrastive learning methods were used to calculate the similarity between semantic and acoustic embeddings, and construct a contrastive language-audio pretraining model for birds (CLAP-Bird). Subsequently, zero-shot classification for bird audio was realized based on the side information for zero-shot classes and CLAP-Bird model. Result: The proposed method was trained and evaluated on a large imbalanced bird audio dataset containing 725 hours of recordings. The average F1_score obtained across five different test sets, each with 8 to 10 classes, was 0.289. Compared to baseline models that were used for bird scientific name, life history, and basic characteristics as side information for species class, the proposed model significantly improved the zero-shot classification performance for bird audio. Conclusion: The taxonomy of birds is served as side information for species class, which provides insights into the biological and genetic relationships about bird species, helps the model better understand the connections between bird sounds and improves the performance of zero-shot learning for bird audio classification. Moreover, the closer the taxonomic relationship between the training set and the test set, the better the zero-shot classification performance on the test set.

Key words: bird audio classification, zero-shot learning, taxonomy, side information for species class, contrastive learning

CLC Number:

TP183

Shanshan Xie,Junguo Zhang,Jiangjian Xie,Changchun Zhang. Zero-Shot Classification of Bird Audio Based on Taxonomy[J]. Scientia Silvae Sinicae, 2025, 61(2): 12-20.

Figures/Tables 7

Fig.1

Fig.2

Fig.3

Table 1

Table 2

Table 3

Table 4

References 0

	陈丽霞, 刘化金, 刘宇霖, 等. 2019. 兴凯湖不同栖息地水鸟群落差异分析. 林业科学, 55(1): 56−65.
	Chen L X, Liu H J, Liu Y L, et al. Analysis on the variation of waterbird communities in different habitats of Khanka lake in China. Scientia Silvae Sinicae, 55(1): 56−65.［in Chinese］
	莫锦华, 李　佳, 刘　芳, 等. 利用红外相机调查海南尖峰岭地区兽类和鸟类多样性. 林业科学, 2019, 55 (10): 203- 210.
	Mo J H, Li J, Liu F, et al. A survey of mammals and birds diversity in Jianfengling district of Hainan province by using camera-trapping. Scientia Silvae Sinicae, 2019, 55 (10): 203- 210.
	齐鑫伟, 侍洪波, 宋　冰, 等. 2024. 基于自上而下注意力机制的零样本目标检测. 华东理工大学学报 (自然科学版), 50(6): 859−868.
	Qi X W, Shi H B, Song B, et al. 2024. Zero-shot object detection based on top-down attention mechanism. Journal of East China University of Science and Technology, 50(6): 859−868. ［in Chinese］
	谢将剑, 沈　忱, 张飞宇, 等. 融合音频及生态位信息的跨地域鸟类物种识别方法. 生物多样性, 2024, 32, 24259.
	Xie J J, Shen C, Zhang F Y, et al. Cross-regional bird species recognition method integrating audio and ecological niche information. Biodiversity Science, 2024, 32, 24259.
	Arato J, Fitch W T. Phylogenetic signal in the vocalizations of vocal learning and vocal non-learning birds. Philosophical Transactions of the Royal Society B, 2021, 376 (1836): 20200241. doi: 10.1098/rstb.2020.0241
	Bocaccio H, Domínguez M, Mahler B, et al. Identification of dialects and individuals of globally threatened Yellow Cardinals using neural networks. Ecological Informatics, 2023, 78, 102372. doi: 10.1016/j.ecoinf.2023.102372
	Chen K, Du X J, Zhu B L, et al. 2022. HTS-AT: A hierarchical token-semantic audio transformer for sound classification and detection. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 646−650.
	Devlin J, Chang M, Lee K, et al. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 1−10.
	Elizalde B, Deshmukh S, Al Ismail M, et al. 2023. Clap learning audio concepts from natural language supervision. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1−5.
	Elizalde B, Deshmukh S, Wang H M. 2024. Natural language supervision for general-purpose audio representations. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 336−340.
	Fonseca E, Favory X, Pons J, et al. 2021. Fsd50k: an open dataset of human-labeled sound events. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30: 829−852.
	Farnsworth A, Lovette I J. Phylogenetic and ecological effects on interspecific variation in structurally simple avian vocalizations. Biological Journal of the Linnean Society, 2008, 94 (1): 155- 173. doi: 10.1111/j.1095-8312.2008.00973.x
	Gebhard A, Triantafyllopoulos A, Bez T, et al. 2024. Exploring meta information for audio-based zero-shot bird classification. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1211−1215.
	Hu E, Shen Y L, Wallis P, et al. 2021. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv: 2106.09685.
	Kim G, Wu H H, Bondi L, et al. 2024. Multi-modal continual pre-training for audio encoders. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 691−695.
	Kong Q Q, Cao Y, Iqbal T, et al. 2020. Panns: Large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28: 2880−2894.
	Liu Z, Lin Y T, Cao Y, et al. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 10012−10022.
	Liu Y, Ott M, Goyal N, et al. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv: 1907.11692.
	Miao Z Q, Elizalde B, Deshmukh S, et al. 2023. Zero-shot transfer for wildlife bioacoustics detection. PREPRINT (Version 1) available at Research Square.
	McFee B, Raffel C, Liang D W, et al. 2015. librosa: Audio and music signal analysis in python. Proceedings of the Python in Science Conference, 18−24.
	Mei X H, Meng C T, Liu H H, et al. 2024. Wavcaps: a chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 32: 3339−3354.
	Pourpanah F, Abdar M, Luo Y X, et al. A review of generalized zero-shot learning methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45 (4): 4051- 4070.
	Radford A, Kim J W, Hallacy C, et al. 2021. Learning transferable visual models from natural language supervision. Proceedings of the International Conference on Machine Learning (PMLR), 8748−8763.
	Storchová L, Hořák D, Hurlbert A. 2018. Data from: life-history characteristics of European birds. Dataset10, 5061.
	Sangster G. 2018. Integrative taxonomy of birds: the nature and delimitation of species. Bird Species: How they arise, modify and vanish, 9−37.
	Stevens S, Wu J M, Thompson M J, et al. 2024. Bioclip: a vision foundation model for the tree of life. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 19412−19424.
	Tang Q, Xu L M, Zheng B C, et al. Transound: Hyper-head attention transformer for birds sound recognition. Ecological Informatics, 2023, 75, 102001. doi: 10.1016/j.ecoinf.2023.102001
	Tobias J A, Sheard C, Pigot A L, et al. AVONET: morphological, ecological and geographical data for all birds. Ecology Letters, 2022, 25 (3): 581- 597. doi: 10.1111/ele.13898
	Vosoughi A, Bondi L, Wu H H, et al. 2024. Learning audio concepts from counterfactual natural language. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 366−370.
	Wu H H, Seetharaman P, Kumar K, et al. 2022. Wav2clip: Learning robust audio representations from clip. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4563−4567.
	Wu Y S, Chen K, Zhang T Y, et al. 2023. Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1−5.
	Wang W, Zheng V W, Yu H, et al. A survey of zero-shot learning: Settings, methods, and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 2019, 10 (2): 1- 37.
	Xie J, Zhu M Y. Sliding-window based scale-frequency map for bird sound classification using 2D-and 3D-CNN. Expert Systems with Applications, 2022, 207, 118054. doi: 10.1016/j.eswa.2022.118054
	Xie J J, Zhong Y J, Zhang J G, et al. A review of automatic recognition technology for bird vocalizations in the deep learning era. Ecological Informatics, 2023, 73, 101927. doi: 10.1016/j.ecoinf.2022.101927

模型 Model	物种类别辅助信息 Side information for species class	开发集 Development			测试集 Test
模型 Model	物种类别辅助信息 Side information for species class	准确率 ACC	平均召回率 UAR	平均F1_score F1_score	准确率 ACC	平均召回率 UAR	平均F1_score F1_score
CLAP-Laion	学名 Scientific name	0.191	0.134	0.061	0.131	0.111	0.052
CLAP-Laion-LoRA	学名 Scientific name	0.281	0.270	0.245	0.219	0.230	0.196
CLAP-Laion-LoRA	系统分类学信息Taxonomic	0.372	0.305	0.305	0.269	0.275	0.249
AST-LN	鸟类基础特性和生活史AVONET+BLH	0.335	0.281	0.244	0.287	0.295	0.233
CLAP-Bird-AVONET+BLH	鸟类基础特性和生活史AVONET+BLH	0.095	0.102	0.080	0.085	0.111	0.032
CLAP-Bird	学名 Scientific name	0.322	0.279	0.266	0.288	0.244	0.244
CLAP-Bird	系统分类学信息 Taxonomic	0.395	0.302	0.307	0.323	0.291	0.289

	开发集 Development			测试集 Test
分割 Splits	准确率 ACC	平均召回率 UAR	平均F1_score F1_score	准确率 ACC	平均召回率 UAR	平均F1_score F1_score
1	0.334	0.268	0.276	0.480	0.356	0.364
2	0.427	0.330	0.334	0.306	0.296	0.310
3	0.432	0.346	0.354	0.302	0.287	0.281
4	0.348	0.258	0.257	0.297	0.303	0.304
5	0.433	0.309	0.313	0.230	0.212	0.188
均值Mean	0.395	0.302	0.307	0.323	0.291	0.289

分割 Splits	准确率 ACC	平均召回率 UAR	平均F1_score F1_score	鸟类物种 Bird species
1	0.748	0.668	0.724	苍鹰Accipiter gentilis、湿地苇莺Acrocephalus palustris、草地鹨Anthus pratensis、红额金翅雀Carduelis carduelis、扇尾沙锥Gallinago gallinago、大山雀Parus major、普通红尾鸲Phoenicurus phoenicurus、博氏柳莺Phylloscopus bonelli
2	0.862	0.805	0.827	斑尾林鸽Columba palumbus、黑啄木鸟Dryocopus martius、沼泽山雀Parus palustris、家麻雀Passer domesticus、树麻雀Passer montanus、叽咋柳莺Phylloscopus collybita、草原石鵖Saxicola rubetra、园林莺Sylvia borin、凤头麦鸡Vanellus vanellus
3	0.771	0.700	0.726	黑斑蝗莺Locustella naevia、松鸦Garrulus glandarius、大斑啄木鸟Dendrocopos major、褐头山雀Parus montanus、普通戴菊Regulus ignicapillus、蚁鴷Jynx torquilla、苍头燕雀Fringilla coelebs、锡嘴雀Coccothraustes coccothraustes、林百灵Lullula arborea
4	0.869	0.817	0.822	欧金翅雀Carduelis chloris、短趾旋木雀Certhia brachydactyla、寒鸦Corvus monedula、黄鹀Emberiza citrinella、红胸姬鹟Ficedula parva、红交嘴雀Loxia curvirostra、穗?Oenanthe oenanthe、戴菊Regulus regulus、槲鸫Turdus viscivorus

模型 Models	物种类别辅助信息 Side information for species class	准确率 ACC	平均召回率 UAR	平均F1_score F1_score
CLAP-Bird	目+科+属+种 Order+Family+Genus+Species	0.323	0.291	0.289
	科+属+种 Family+Genus+Species	0.315	0.282	0.279
	属+种 Genus+Species	0.293	0.255	0.254
	种 Species	0.240	0.207	0.206

[1]	Mengmeng Liu,Mingxuan Liu,Yujie Mou,Zejian Li,Meicai Wei. Revision of the Effective Species Name of Nematus trochanteratus (Hymenoptera: Tenthredinidae), a Defoliator of Willows in China [J]. Scientia Silvae Sinicae, 2024, 60(6): 102-110.
[2]	Xingyu Wu,Heng Xin,Qiqing Yang,Shanghua Chen,Hannan Wang,Meicai Wei. A New Species of the Genus Neodiprion Rohwer (Hymenoptera: Diprionidae), A Defoliator of Picea crassifolia [J]. Scientia Silvae Sinicae, 2021, 57(6): 111-114.
[3]	Guo Kai, Gu Jianfeng, Wang Jiangling, Hu Jiafu. Redescription of Devibursaphelenchus hunanensis (Yin et al., 1988) (Nematoda: Ektaphelenchinae) from Pinus massoniana in China with the Synonymy of D. eproctatus (Sriwati et al., 2008) [J]. Scientia Silvae Sinicae, 2014, 50(7): 82-89.
[4]	Jia Zirui, Wang Junhui, Zhang Shougong, Ma Jianwei, Yang Guijuan. Pollen Morphology of 20 Species in Picea by Scanning Electron Microscope [J]. Scientia Silvae Sinicae, 2014, 50(5): 49-61.
[5]	Yao Yanxia;Yang Zhongqi;Yan Jiahe. A New Species of the Genus Systasis Walker， 1834 (Hymenoptera： Pteromalidae) Parasitizing an Important Invasive Alien Insect Pest Obolodiplosis robiniae (Diptera： Cecidomyiidae) from China [J]. Scientia Silvae Sinicae, 2009, 12(8): 88-90.
[6]	Deng Bailuo;Tan Xiaofeng;Qi Longlin;He Jin;Hu Fangming. RAPD Analysis and Taxonomy of Sect. Camellia Species in Camellia [J]. Scientia Silvae Sinicae, 2006, 42(5): 36-41.
[7]	Qin Guofu;Zhao Jun;Liu Xiaoyong. CURRENT STATUS AND PROBLEMS OF MOLECULAR CLASSIFICATION IN THE PHYTOPLASMAS [J]. Scientia Silvae Sinicae, 2002, 38(6): 125-136.
[8]	Xu Huangcan;Yin Guangtian;Sun Qingpeng;Wu Jinkun. RESEARCH AND DEVELOPMENT OF RATTAN IN CHINA [J]. Scientia Silvae Sinicae, 2002, 38(2): 135-143.
[9]	Xingyao Zhang,Shiguang Zhao,Chungen Piao,Quan Lü,Xiuzhen Jia. MOLECULAR GENETIC DIVERSITY OF PATHOGENIC FUNGAL GROUP CAUSING TREE CANKER Ⅰ.——EVIDENCE OF TAXA FOR DOTHIORELLA, DOTHICHIZA, CYTOSPORA AND CONIOTHYRIUM IN MOLECULAR LEVEL [J]. Scientia Silvae Sinicae, 1999, 35(3): 34-40.
[10]	Xingyao Zhang. A STUDY ON THE TAXONOMY OF EXOBASIDIUM SPP. ACCORDING TO THE FUZZY ANALYSIS OF CULTURAL PROPERTIES AND THE ANALYSIS OF 28S rDNA-PCR-RFLP [J]. Scientia Silvae Sinicae, 1998, 34(4): 59-71.
[11]	Dongsheng Cheng,Xiaoyun Han,Yu Xue,Xueren Pan,Wuhan Li. ISOZYMIC VARIATION AMONG AND WITHIN SPECIES OF PINE-STEM RUST FUNGI IN CHINA [J]. Scientia Silvae Sinicae, 1997, 33(4): 330-337.
[12]	Wanjun Zheng,Xin Duanmu. A TAXONOMIC STUDY ON CASTANOPSIS SPACH IN CHINA [J]. Scientia Silvae Sinicae, 1996, 32(1): 11-15.
[13]	Guofang Jiang,Zhemin Zheng. A NEW SPECIES OF CARYANDA FROM JINZHONGSHAN AREA OF GUANGXI, CHINA [J]. Scientia Silvae Sinicae, 1995, 31(2): 132-134.
[14]	Li Fasheng;Sun Lihua. FOUR NEW SPECIES OF THE TRIOZIDAE FROM LIAONING, CHINA(Homoptera: Psylloidea) [J]. , 1994, 30(6): 525-530.
[15]	Wu Zhaolu. A REVIEW OF THE RESEARCH STATUS OF PINUS KESIYA VAR. LANGBIANENSIS IN SOUTHWESTERN CHINA [J]. , 1994, 30(2): 151-157.

Zero-Shot Classification of Bird Audio Based on Taxonomy

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 7

References 0

Related Articles 15

Recommended Articles

Metrics

Comments