欢迎访问林业科学,今天是

林业科学 ›› 2023, Vol. 59 ›› Issue (1): 119-127.doi: 10.11707/j.1001-7488.LYKX20210886

• 研究论文 • 上一篇    下一篇

基于卷积神经网络的雄性海南长臂猿声纹识别

冯慧敏,金崑*   

  1. 中国林业科学研究院森林生态环境与自然保护研究所 中国林业科学研究院自然保护地研究所 生物多样性保护国家林业和草原局重点实验室 北京 100091
  • 收稿日期:2021-11-30 出版日期:2023-01-25 发布日期:2023-02-24
  • 通讯作者: 金崑
  • 基金资助:
    中国林业科学研究院中央级公益性科研院所基本科研业务费专项(重点项目)(CAFYBB2019ZB010);国家林业和草原局项目(21302112021-4023);国家林业和草原局项目(21302112019-4017);国家林业和草原局项目(21302102021-4106)

Voiceprint Recognition of Male Nomascus hainanus Based on Convolutional Neural Network

Huimin Feng,Kun Jin*   

  1. Ecology and Nature Conservation Institute, Chinese Academy of Forestry Research Institute of Natural Protected Area, Chinese Academy of Forestry Key Laboratory of Biodiversity Conservation of National Forestry and Grassland Administration Beijing 100091
  • Received:2021-11-30 Online:2023-01-25 Published:2023-02-24
  • Contact: Kun Jin

摘要:

目的: 利用鸣叫声对雄性海南长臂猿个体进行识别,为海南长臂猿种群智能感知和监测及海南热带雨林国家公园智慧保护地建设提供支撑。方法: 许多研究证明某些物种鸣叫的声音具有个体差异,这种差异可以作为一种声音指纹来对物种个体进行识别。本研究基于雄性海南长臂猿鸣叫声谱的特征以及声纹识别的基本原理,提出基于卷积神经网络的声纹识别的方法,通过采用主动声学监测和被动声学监测2种方法收集海南长臂猿鸣叫的原始数据,对原始数据进行预处理,将7只雄性海南长臂猿鸣叫短语中的调频音符组合的声谱图作为输入。通过搭建卷积神经网络和残差卷积神经网络2种模型,7只雄性海南长臂猿鸣叫声谱中提取声纹特征并进行分类,实现个体识别。结果: 通过五折交叉验证得出卷积神经网络模型识别正确率为91.2%,识别效果标准差为4.24%。残差卷积神经网络模型识别正确率为95.04%,识别效果标准差为2.97%。相比卷积神经网络,残差卷积神经网络识别准确率更高,且分类效果更加稳定,但是计算耗时更长。结论: 利用卷积神经网络模型和残差卷积神经网络模型对雄性海南长臂猿鸣叫声谱图进行分类并实现个体识别是可靠的,本方法可以应用于对海南长臂猿的声纹识别。相比卷积神经网络,残差卷积神经网络模型识别的稳定性更好,且分类效果提高3.84%,达到95.04%。但从应用性上而言,对比残差卷积神经网络,卷积神经网络模型训练成本更低,推理计算速度更快,且准确率和预测稳定性达到应用要求。基于卷积神经网络的声纹识别方法克服了许多现有方法中存在的计算和数据集的限制,为将来其他物种的声纹识别研究提供了更好的解决方案。

关键词: 海南长臂猿, 海南热带雨林国家公园, 声谱图, 卷积神经网络, 声纹识别

Abstract:

Objective: Nomascus hainanus is an endemic and critically endangered species in China. They inhabit in dense forests in Hainan Tropical Rainforest National Park, and singing is an important part of the behavior of N. hainanus. This study aims to identify male N. hainanus individuals by their song, so as to provide support for intelligent perception and monitoring of N. hainanus population in the future and construction of intelligent protected areas in Hainan Tropical Rainforest National Park. Method: Many studies have proved that the vocal sounds of some species have individual differences, which can be used as a kind of acoustic fingerprint to identify species individuals. In this paper, by studying the characteristics of the song spectrum of male N. hainanus and the basic principle of voiceprint recognition, a method of voiceprint recognition was proposed based on Convolutional Neural Network. The active acoustic monitoring and passive acoustic monitoring of two kinds of methods were used to collect the original data of N. hainanus songs, and the original data were preprocessed, and the phonograms of the combination of FM notes in the song phrase of seven male N. hainanus were used as input. By building CNN and Residual CNN models, the voiceprint features of seven male N. hainanus in five populations were extracted and classified to realize individual recognition. Result: The five fold cross validation showed that the recognition accuracy of CNN model was 91.2%, the standard deviation of recognition effect was 4.24%, and the inference time was 40 ms. The recognition accuracy of Residual CNN model was 95.04%, the standard deviation of recognition effect was 2.97%, and the reasoning time was 120 ms. Compared with CNN, Residual CNN had higher recognition accuracy and more stable classification effect, but it took longer time to calculate. Conclusion: The actual verification results show that the CNN model and Residual CNN model are reliable for the classification and individual recognition of male N. hainanu by their song spectrograms, and this method can be applied to the voiceprint recognition of Hainan gibbon. Compared with CNN, the Residual CNN model has better recognition stability, and the classification effect is improved by 3.84% to 95.04%. However, from the perspective of application, compared with Residual CNN, CNN model has lower training cost, faster inference calculation speed, and the accuracy and prediction stability can meet the application requirements. The voiceprint recognition method based on Convolutional Neural Network overcomes the limitations of calculation and data set in many existing methods, and provides a better solution for the voiceprint recognition research of other species in the future.

Key words: N. hainanus, Hainan Tropical Rainforest National Park, spectrogram, Convolutional Neural Network, voiceprint recognition

中图分类号: