基于卷积神经网络的雄性海南长臂猿声纹识别

doi:10.11707/j.1001-7488.LYKX20210886

摘要/Abstract

摘要：

目的: 利用鸣叫声对雄性海南长臂猿个体进行识别，为海南长臂猿种群智能感知和监测及海南热带雨林国家公园智慧保护地建设提供支撑。方法: 许多研究证明某些物种鸣叫的声音具有个体差异，这种差异可以作为一种声音指纹来对物种个体进行识别。本研究基于雄性海南长臂猿鸣叫声谱的特征以及声纹识别的基本原理，提出基于卷积神经网络的声纹识别的方法，通过采用主动声学监测和被动声学监测2种方法收集海南长臂猿鸣叫的原始数据，对原始数据进行预处理，将7只雄性海南长臂猿鸣叫短语中的调频音符组合的声谱图作为输入。通过搭建卷积神经网络和残差卷积神经网络2种模型，7只雄性海南长臂猿鸣叫声谱中提取声纹特征并进行分类，实现个体识别。结果: 通过五折交叉验证得出卷积神经网络模型识别正确率为91.2%，识别效果标准差为4.24%。残差卷积神经网络模型识别正确率为95.04%，识别效果标准差为2.97%。相比卷积神经网络，残差卷积神经网络识别准确率更高，且分类效果更加稳定，但是计算耗时更长。结论: 利用卷积神经网络模型和残差卷积神经网络模型对雄性海南长臂猿鸣叫声谱图进行分类并实现个体识别是可靠的，本方法可以应用于对海南长臂猿的声纹识别。相比卷积神经网络，残差卷积神经网络模型识别的稳定性更好，且分类效果提高3.84%，达到95.04%。但从应用性上而言，对比残差卷积神经网络，卷积神经网络模型训练成本更低，推理计算速度更快，且准确率和预测稳定性达到应用要求。基于卷积神经网络的声纹识别方法克服了许多现有方法中存在的计算和数据集的限制，为将来其他物种的声纹识别研究提供了更好的解决方案。

关键词: 海南长臂猿, 海南热带雨林国家公园, 声谱图, 卷积神经网络, 声纹识别

Abstract:

Objective: Nomascus hainanus is an endemic and critically endangered species in China. They inhabit in dense forests in Hainan Tropical Rainforest National Park, and singing is an important part of the behavior of N. hainanus. This study aims to identify male N. hainanus individuals by their song, so as to provide support for intelligent perception and monitoring of N. hainanus population in the future and construction of intelligent protected areas in Hainan Tropical Rainforest National Park. Method: Many studies have proved that the vocal sounds of some species have individual differences, which can be used as a kind of acoustic fingerprint to identify species individuals. In this paper, by studying the characteristics of the song spectrum of male N. hainanus and the basic principle of voiceprint recognition, a method of voiceprint recognition was proposed based on Convolutional Neural Network. The active acoustic monitoring and passive acoustic monitoring of two kinds of methods were used to collect the original data of N. hainanus songs, and the original data were preprocessed, and the phonograms of the combination of FM notes in the song phrase of seven male N. hainanus were used as input. By building CNN and Residual CNN models, the voiceprint features of seven male N. hainanus in five populations were extracted and classified to realize individual recognition. Result: The five fold cross validation showed that the recognition accuracy of CNN model was 91.2%, the standard deviation of recognition effect was 4.24%, and the inference time was 40 ms. The recognition accuracy of Residual CNN model was 95.04%, the standard deviation of recognition effect was 2.97%, and the reasoning time was 120 ms. Compared with CNN, Residual CNN had higher recognition accuracy and more stable classification effect, but it took longer time to calculate. Conclusion: The actual verification results show that the CNN model and Residual CNN model are reliable for the classification and individual recognition of male N. hainanu by their song spectrograms, and this method can be applied to the voiceprint recognition of Hainan gibbon. Compared with CNN, the Residual CNN model has better recognition stability, and the classification effect is improved by 3.84% to 95.04%. However, from the perspective of application, compared with Residual CNN, CNN model has lower training cost, faster inference calculation speed, and the accuracy and prediction stability can meet the application requirements. The voiceprint recognition method based on Convolutional Neural Network overcomes the limitations of calculation and data set in many existing methods, and provides a better solution for the voiceprint recognition research of other species in the future.

Key words: N. hainanus, Hainan Tropical Rainforest National Park, spectrogram, Convolutional Neural Network, voiceprint recognition

中图分类号:

S718.6
Q62

冯慧敏,金崑. 基于卷积神经网络的雄性海南长臂猿声纹识别[J]. 林业科学, 2023, 59(1): 119-127.

Huimin Feng,Kun Jin. Voiceprint Recognition of Male Nomascus hainanus Based on Convolutional Neural Network[J]. Scientia Silvae Sinicae, 2023, 59(1): 119-127.

图/表 10

图1

图2

图3

表1

图4

图5

图6

图7

图8

表2

参考文献 0

	邓怀庆, 周江. 海南长臂猿研究现状. 四川动物, 2015, 34 (4): 635- 640.
	Deng H Q , Zhou J . The research status of Hainan gibbon. Sichuan Journal of Zoology, 2015, 34 (4): 635- 640.
	范朋飞. 中国长臂猿科动物的分类和保护现状. 兽类学报, 2012, 32 (3): 248- 258.
	Fan P F . Taxonomy and conservation status of gibbons in China. Acta Theriologica Sinica, 2012, 32 (3): 248- 258.
	胡绍湘, 陈鹏, 侯蓉, 等. 2019. 一种大熊猫个体识别方法、设备及计算机可读存储介质: CN110189757A.
	Hu Shaoxiang, Chen Peng, Hou Rong, et al. 2019. A giant panda individual identification Device and Computer Readable storage medium: CN110189757A. [in Chinese]
	Abdel-Hamid O, Mohamed A, Jiang H, et al. 2012. Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition//2012 IEEE international conference on Acoustics, speech and signal processing(ICASSP). IEEE, 4277-4280.
	Bartlett T Q , Light L E O , Brockelman W Y . Long-term home range use in white-handed gibbons(Hylobates lar) in Khao Yai National Park, Thailand. American Journal of Primatology, 2016, 78 (2): 192- 203. doi: 10.1002/ajp.22492
	Bergler C , Schr ter H , Cheng R X , et al. ORCA-SPOT: An automatic killer whale sound detection toolkit using deep learning. Scientific Reports, 2019, 9 (1): 1- 17. doi: 10.1038/s41598-018-37186-2
	Blumstein D T , Mennill D J , Clemins P , et al. Acoustic monitoring in terrestrial environments using microphone arrays: applications, technological considerations and prospectus. Journal of Applied Ecology, 2011, 48 (3): 758- 767. doi: 10.1111/j.1365-2664.2011.01993.x
	Bryant J V , Zeng X , Hong X , et al. Spatiotemporal requirements of the Hainan gibbon: Does home range constrain recovery of the world's rarest ape?. American Journal of Primatology, 2017, 73 (6): e22617.
	Clink D J , Crofoot M C , Marshall A J . Application of a semi-automated vocal fingerprinting approach to monitor Bornean gibbon females in an experimentally fragmented landscape in Sabah, Malaysia. Bioacoustics-the International Journal of Animal Sound & Its Recording, 2018, 28 (3): 193- 209.
	Dallmann R , Geissmann T . Individuality in the female songs of wild Silvery Gibbons(Hylobates moloch) on Java, Indonesia. Contributions to Zoology, 2001, 70 (1): 41- 50. doi: 10.1163/18759866-07001003
	Delgado R A . Geographic variation in the long calls of male orangutans(Pongo spp. ). Ethology, 2010, 113 (5): 487- 498.
	Deng H , Zhou J , Yang Y . Sound spectrum characteristics of songs of hainan gibbon(Nomascus hainanus). International Journal of Primatology, 2014, 35 (2): 547- 556. doi: 10.1007/s10764-014-9767-3
	Fedurek P , Zuberbühler K , Dahl C D . Sequential information in a great ape utterance. Scientific Reports, 2016, 6 (1): 38226. doi: 10.1038/srep38226
	Feng J J , Cui L W , Ma C Y , et al. Individuality and stability in male songs of cao vit gibbons(Nomascus nasutus) with potential to monitor population dynamics. PLoS One, 2014, 9 (5): e96317. doi: 10.1371/journal.pone.0096317
	Grava T , Mathevon N , Place E , et al. Individual acoustic monitoring of the European Eagle Owl Bubo bubo. Ibis, 2010, 150 (2): 279- 287.
	Grill T, Schlüter J. 2017. Two convolutional neural networks for bird detection in audio signals//2017 25th European Signal Processing Conference(EUSIPCO). IEEE, 1764-1768.
	Haimoff E H , Tilson R L . Individuality in the female songs of wild Kloss' gibbons(Hylobates klossii) on Siberut Island, Indonesia. Folia Primatologica, 1985, 44 (3-4): 129- 137. doi: 10.1159/000156207
	He K, Zhang X, Ren S, et al. 2016. Deep residual learning for image recognition//Proceedings of the IEEE conference on computer vision and pattern recognition. 770-778.
	Ji A , Johnson M T , Walsh E J , et al. Discrimination of individual tigers(Panthera tigris) from long distance roars. The Journal of the Acoustical Society of America, 2013, 133 (3): 1762- 1769. doi: 10.1121/1.4789936
	Kirschel A , Earl D A , Yao Y , et al. Using songs to identify individual Mexican Antthrush(Formicarius moniliger): a comparison of four classification methods. Bioacoustics-the International Journal of Animal Sound & Its Recording, 2009, 19 (1/2): 1- 20.
	Kulyukin V , Mukherjee S , Amlathe P . Toward audio beehive monitoring: Deep learning vs. standard machine learning in classifying beehive audio samples. Applied Sciences, 2018, 8 (9): 1573.
	Lukic Y, Vogt C, Dürr O, et al. 2016. Speaker identification and clustering using convolutional neural networks//2016 IEEE 26th international workshop on machine learning for signal processing(MLSP). IEEE, 1-6.
	Maegawa Y , Haga C , Matsui T , et al. A new survey method using convolutional neural networks for automatic classification of bird calls. Ecological Informatics, 2020, 61 (4): 101164.
	Oyakawa C , Koda H , Sugiura H . Acoustic features contributing to the individuality of wild agile gibbon(Hylobates agilis) songs. American Journal of Primatology, 2010, 69 (7): 777- 790.
	Rabiner L R . A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 1989, 77 (2): 257- 286. doi: 10.1109/5.18626
	Rabiner L R , Levinson S E , Sondhi M M . On the application of vector quantization and hidden markov models to speaker-independent, isolated word recognition. The Bell System Technical Journal, 2014, 62 (4): 1075- 1105.
	Rawat W , Wang Z . Deep convolutional neural networks for image classification: a comprehensive review. Neural Computation, 2017, 29 (9): 2352- 2449.
	Reynolds D A , Rose R C . Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speach & Audio Processing, 1995, 3 (1): 72- 83.
	Sarıgül M , Ozyildirim B M , Avci M . Differential convolutional neural network. Neural Networks, 2019, 116, 279- 287.
	Sethi S S , Jones N S , Fulcher B D , et al. Characterizing soundscapes across diverse ecosystems using a universal acoustic feature set. Proceedings of the National Academy of Sciences, 2020, 117 (29): 17049- 17055.
	Spillmann B , Schaik C P V , Setia T M , et al. Who shall I say is calling? Validation of a caller recognition procedure in Bornean flanged male orangutan(Pongo pygmaeus wurmbii) long calls. Bioacoustics-the International Journal of Animal Sound & Its Recording, 2017, 26 (2): 109- 120.
	Sprengel E, Jaggi M, Kilcher Y, et al. 2016. Audio based bird species identification using deep learning techniques. LifeCLEF: 547-559.
	Sun G Z , Huang B , Guan Z H , et al. Individuality in male songs of wild black crested gibbons(Nomascus concolor). American Journal of Primatology, 2011, 73 (5): 431- 438.
	Terleph T A , Malaivijitnond S , Reichard U H . Lar gibbon(Hylobates lar) great call reveals individual caller identity. American Journal of Primatology, 2015, 77 (7): 811- 821.
	Zhou H, Song Y, Shu H. 2017. Using deep convolutional neural network to classify urban sounds//TENCON 2017-2017 IEEE Region 10 Conference. IEEE, 3089-3092.

鸣叫个体Individuals	样本量Sample size
A群大公Group A male	165
A群亚成体雄性Group A subadult male	25
B群大公Group B male	136
C群大公Group C male	57
C群雄性亚成体Group C subadult male	18
D群大公Group D male	65
E群大公Group E male	160
总计Total	626

建模方法 Identification methods	Split 1准确率 Accuracy of Split 1	Split 2准确率 Accuracy of Split 2	Split 3准确率 Accuracy of Split 3	Split 4准确率 Accuracy of Split 4	Split 5准确率 Accuracy of Split 5	平均分类准确率 Average Classification Accuracy	准确度标准差 Standard deviation of accuracy
卷积神经网络 CNN	97.62	89.60	92.80	86.40	89.60	91.20	4.24
残差卷积神经网络 Residual CNN	97.62	98.40	94.40	91.20	93.60	95.04	2.97

[1]	齐建东,谭新新. 长白山红松阔叶林的净碳交换变化及基于时间卷积神经网络的模拟[J]. 林业科学, 2022, 58(2): 1-12.
[2]	何拓,刘守佳,陆杨,张永刚,焦立超,殷亚方. iWood: 基于卷积神经网络的濒危珍贵树种木材自动识别系统[J]. 林业科学, 2021, 57(9): 152-159.
[3]	赵子宇,杨霄霞,郭慧,葛浙东,周玉成. 基于卷积神经网络模型的木材宏、微观辨识方法[J]. 林业科学, 2021, 57(6): 134-143.
[4]	宿恒硕,吕军,丁志平,唐彦杰,陈旭东,周强,张哲宇,姚青. 基于改进残差神经网络的木材识别算法[J]. 林业科学, 2021, 57(12): 147-154.
[5]	刘璇昕,孙钰,崔剑,蒋琦,陈志泊,骆有庆. 钻蛀性害虫取食声音的人工智能早期识别[J]. 林业科学, 2021, 57(10): 93-101.
[6]	谢锦莹,丁丽霞,王志辉,刘丽娟. 基于FCN与面向对象的滨海湿地植被分类[J]. 林业科学, 2020, 56(8): 98-106.
[7]	郭颖,李增元,陈尔学,张旭,赵磊,陈艳,王雅慧. 一种改进的高空间分辨率遥感影像森林类型深度学习精细分类方法:双支FCN-8s[J]. 林业科学, 2020, 56(3): 48-60.
[8]	赵霖,张晓丽,吴艳双,张斌. 面向机载高光谱数据的3D-CNN亚热带森林树种分类[J]. 林业科学, 2020, 56(11): 97-107.
[9]	张广群,李英杰,汪杭军,周厚奎. 基于微调CaffeNet的林业图像分类[J]. 林业科学, 2020, 56(10): 121-128.
[10]	陈龙现, 葛浙东, 罗瑞, 刘传泽, 刘晓平, 周玉成. 基于CNN的木材内部CT图像缺陷辨识[J]. 林业科学, 2018, 54(11): 127-133.