|
陈丽霞, 刘化金, 刘宇霖, 等. 2019. 兴凯湖不同栖息地水鸟群落差异分析. 林业科学, 55(1): 56−65.
|
|
Chen L X, Liu H J, Liu Y L, et al. Analysis on the variation of waterbird communities in different habitats of Khanka lake in China. Scientia Silvae Sinicae, 55(1): 56−65.[in Chinese]
|
|
莫锦华, 李 佳, 刘 芳, 等. 利用红外相机调查海南尖峰岭地区兽类和鸟类多样性. 林业科学, 2019, 55 (10): 203- 210.
|
|
Mo J H, Li J, Liu F, et al. A survey of mammals and birds diversity in Jianfengling district of Hainan province by using camera-trapping. Scientia Silvae Sinicae, 2019, 55 (10): 203- 210.
|
|
齐鑫伟, 侍洪波, 宋 冰, 等. 2024. 基于自上而下注意力机制的零样本目标检测. 华东理工大学学报 (自然科学版), 50(6): 859−868.
|
|
Qi X W, Shi H B, Song B, et al. 2024. Zero-shot object detection based on top-down attention mechanism. Journal of East China University of Science and Technology, 50(6): 859−868. [in Chinese]
|
|
谢将剑, 沈 忱, 张飞宇, 等. 融合音频及生态位信息的跨地域鸟类物种识别方法. 生物多样性, 2024, 32, 24259.
|
|
Xie J J, Shen C, Zhang F Y, et al. Cross-regional bird species recognition method integrating audio and ecological niche information. Biodiversity Science, 2024, 32, 24259.
|
|
Arato J, Fitch W T. Phylogenetic signal in the vocalizations of vocal learning and vocal non-learning birds. Philosophical Transactions of the Royal Society B, 2021, 376 (1836): 20200241.
doi: 10.1098/rstb.2020.0241
|
|
Bocaccio H, Domínguez M, Mahler B, et al. Identification of dialects and individuals of globally threatened Yellow Cardinals using neural networks. Ecological Informatics, 2023, 78, 102372.
doi: 10.1016/j.ecoinf.2023.102372
|
|
Chen K, Du X J, Zhu B L, et al. 2022. HTS-AT: A hierarchical token-semantic audio transformer for sound classification and detection. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 646−650.
|
|
Devlin J, Chang M, Lee K, et al. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 1−10.
|
|
Elizalde B, Deshmukh S, Al Ismail M, et al. 2023. Clap learning audio concepts from natural language supervision. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1−5.
|
|
Elizalde B, Deshmukh S, Wang H M. 2024. Natural language supervision for general-purpose audio representations. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 336−340.
|
|
Fonseca E, Favory X, Pons J, et al. 2021. Fsd50k: an open dataset of human-labeled sound events. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30: 829−852.
|
|
Farnsworth A, Lovette I J. Phylogenetic and ecological effects on interspecific variation in structurally simple avian vocalizations. Biological Journal of the Linnean Society, 2008, 94 (1): 155- 173.
doi: 10.1111/j.1095-8312.2008.00973.x
|
|
Gebhard A, Triantafyllopoulos A, Bez T, et al. 2024. Exploring meta information for audio-based zero-shot bird classification. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1211−1215.
|
|
Hu E, Shen Y L, Wallis P, et al. 2021. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv: 2106.09685.
|
|
Kim G, Wu H H, Bondi L, et al. 2024. Multi-modal continual pre-training for audio encoders. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 691−695.
|
|
Kong Q Q, Cao Y, Iqbal T, et al. 2020. Panns: Large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28: 2880−2894.
|
|
Liu Z, Lin Y T, Cao Y, et al. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 10012−10022.
|
|
Liu Y, Ott M, Goyal N, et al. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv: 1907.11692.
|
|
Miao Z Q, Elizalde B, Deshmukh S, et al. 2023. Zero-shot transfer for wildlife bioacoustics detection. PREPRINT (Version 1) available at Research Square.
|
|
McFee B, Raffel C, Liang D W, et al. 2015. librosa: Audio and music signal analysis in python. Proceedings of the Python in Science Conference, 18−24.
|
|
Mei X H, Meng C T, Liu H H, et al. 2024. Wavcaps: a chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 32: 3339−3354.
|
|
Pourpanah F, Abdar M, Luo Y X, et al. A review of generalized zero-shot learning methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45 (4): 4051- 4070.
|
|
Radford A, Kim J W, Hallacy C, et al. 2021. Learning transferable visual models from natural language supervision. Proceedings of the International Conference on Machine Learning (PMLR), 8748−8763.
|
|
Storchová L, Hořák D, Hurlbert A. 2018. Data from: life-history characteristics of European birds. Dataset10, 5061.
|
|
Sangster G. 2018. Integrative taxonomy of birds: the nature and delimitation of species. Bird Species: How they arise, modify and vanish, 9−37.
|
|
Stevens S, Wu J M, Thompson M J, et al. 2024. Bioclip: a vision foundation model for the tree of life. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 19412−19424.
|
|
Tang Q, Xu L M, Zheng B C, et al. Transound: Hyper-head attention transformer for birds sound recognition. Ecological Informatics, 2023, 75, 102001.
doi: 10.1016/j.ecoinf.2023.102001
|
|
Tobias J A, Sheard C, Pigot A L, et al. AVONET: morphological, ecological and geographical data for all birds. Ecology Letters, 2022, 25 (3): 581- 597.
doi: 10.1111/ele.13898
|
|
Vosoughi A, Bondi L, Wu H H, et al. 2024. Learning audio concepts from counterfactual natural language. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 366−370.
|
|
Wu H H, Seetharaman P, Kumar K, et al. 2022. Wav2clip: Learning robust audio representations from clip. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4563−4567.
|
|
Wu Y S, Chen K, Zhang T Y, et al. 2023. Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1−5.
|
|
Wang W, Zheng V W, Yu H, et al. A survey of zero-shot learning: Settings, methods, and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 2019, 10 (2): 1- 37.
|
|
Xie J, Zhu M Y. Sliding-window based scale-frequency map for bird sound classification using 2D-and 3D-CNN. Expert Systems with Applications, 2022, 207, 118054.
doi: 10.1016/j.eswa.2022.118054
|
|
Xie J J, Zhong Y J, Zhang J G, et al. A review of automatic recognition technology for bird vocalizations in the deep learning era. Ecological Informatics, 2023, 73, 101927.
doi: 10.1016/j.ecoinf.2022.101927
|