融合全域与局域特征的深度卷积网络鸟类种群识别

doi:10.11707/j.1001-7488.20200113

林业科学 ›› 2020, Vol. 56 ›› Issue (1): 133-144.doi: 10.11707/j.1001-7488.20200113

融合全域与局域特征的深度卷积网络鸟类种群识别

林志玮^1,^2,³,丁启禄⁶,刘金福^1,^4,^5,*

1. 福建农林大学计算机与信息学院福州 350002
2. 福建农林大学林学院福州 350002
3. 福建农林大学林学博士后流动站福州 350002
4. 福建省高校生态与资源统计重点实验室福州 350002
5. 福建农林大学海峡自然保护区研究中心福州 350002
6. 中国人民银行福州中心支行福州 350003

收稿日期:2019-05-17 出版日期:2020-01-25 发布日期:2020-02-24
通讯作者: 刘金福
基金资助:
教育部人文社会科学研究项目(18YJCZH093);福建省林业科学研究项目(KH1701390);海峡博士后交流资助计划;中国博士后科学基金面上项目(2018M632565)

Bird Species Identification Based on Deep Convolutional Network with Fusing Global and Local Features

Zhiwei Lin^1,^2,³,Qilu Ding⁶,Jinfu Liu^1,^4,^5,*

1. College of Computer and Information Science, Fujian Agriculture and Forestry University Fuzhou 350002
2. College of Forestry, Fujian Agriculture and Forestry University Fuzhou 350002
3. Forestry Post-Doctoral Station of Fujian Agriculture and Forestry University Fuzhou 350002
4. Key Laboratory for Ecology and Resource Statistics of Fujian Province Fuzhou 350002
5. Cross-Strait Nature Reserve Research Center, Fujian Agriculture and Forestry University Fuzhou 350002
6. Fuzhou Central Branch of People's Bank of China Fuzhou 350003

Received:2019-05-17 Online:2020-01-25 Published:2020-02-24
Contact: Jinfu Liu
Supported by:
教育部人文社会科学研究项目(18YJCZH093);福建省林业科学研究项目(KH1701390);海峡博士后交流资助计划;中国博士后科学基金面上项目(2018M632565)

摘要/Abstract

摘要：

目的: 基于鸟类影像数据，探讨全域与局域特征融合手段，结合深度卷积神经网络理论，建构鸟类种群识别模型，以期为森林与湿地的监控与治理提供新的手段。方法: 首先，依据人类识别物体从整体到局部的生理过程，采用跳跃结构实现物体整体信息与局部信息的交互，该模型主要采用2个模型框架提取鸟类的全域和局域部件特征，并采用跳跃结构，提出融合模块（Fusion block）结构进行特征融合，将全局特征信息传递至局部特征抽取模块。该模型训练阶段需提供鸟类局部的部位标注信息，而测试阶段采用Faster R-CNN模型自动提取其鸟类局部标注信息。其次，探讨不同鸟类局部影像信息对模型的影响，最后，通过对比不同网络分类模型和鸟类数据集，验证模型的有效性和适用性。结果: 该鸟类种群分类模型具有较高的分类精度，总体分类精度达90%以上；对于不同的鸟类局部影像信息，其分类精度表现出一定的差异性，其中基于鸟类头部局部影像的网络分类模型总体分类精度最高；Faster R-CNN模型对鸟类局部影像定位精度较高，测试阶段采用人工标注的局部影像标签和Faster R-CNN模型预测的局部影像标签对模型的总体分类精度差异小；对比Inception-V1、ResNet-101、DenseNet-121以及Bilinear CNN等网络分类模型总体分类精度，该模型总体分类精相对较高，具有一定的有效性；对比使用NABirds鸟类数据集的分类效果，该模型总体分类表现较好，具有一定的适用性。结论: 该鸟类种群分类模型具有较好的识别效果以及有效性，可为森林与湿地的监控和治理提供合理有效的依据。

关键词: 鸟类种群识别, 多框架深度神经网络, 全域与局域特征

Abstract:

Objective: In this study, based on the bird images, we construct a bird population identification model with the deep convolutional neural network theory by combining the global and local features fusion method, in order to provide a new approach for monitoring and management of forests and wetlands. Method: First of all, according to the physiological process of the object identification of human from entireness to part, the jump structure was used to implement the interaction between global and local information. In the proposed model, two model frameworks are mainly used to extract the global and local features of birds, and the jump structure is used to propose the fusion module structure for feature fusion, which transfers the global feature information to local feature extraction module. In the training stage of the model, we need to provide the labeling information on the local parts of birds, while in the test stage, we use Faster R-CNN model to automatically extract the labeling information on the local parts of birds. Secondly, we discussed the effects of different bird local image information on the model. Finally, the validity and applicability of the model are verified by comparing different network classification models and bird datasets. Result: The bird species classification model proposed in this paper has high classification accuracy, and the overall classification accuracy is over 90%. For the image information of different parts of a bird, the classification accuracy of the model shows a certain difference, among which the overall classification accuracy of the network classification model based on the bird's head image is the highest. The Faster R-CNN model has a high accuracy in bird part image locating. There is little difference in the overall accuracy between the manually labeled local image tag and the local image tag predicted by Faster R-CNN model in the test stage. Compared with the overall classification accuracy of the network classification models such as Inception-V1、ResNet-101、DenseNet-121 and Bilinear CNN, the overall classification accuracy of the model proposed in this paper is relatively high, that verifies the effectiveness of the proposed classification model of bird. Compared with the classification accuracy by using NABirds bird dataset, the overall classification performance of the proposed model is better, which verifies the applicability of the proposed model. Conclusion: The proposed bird species classification model has good identification results and effectiveness, which can provide a reasonable and effective basis for monitoring and management of forests and wetlands.

Key words: bird identification, deep convolutional neural network, global and local components

中图分类号:

林志玮,丁启禄,刘金福. 融合全域与局域特征的深度卷积网络鸟类种群识别[J]. 林业科学, 2020, 56(1): 133-144.

Zhiwei Lin,Qilu Ding,Jinfu Liu. Bird Species Identification Based on Deep Convolutional Network with Fusing Global and Local Features[J]. Scientia Silvae Sinicae, 2020, 56(1): 133-144.

图/表 15

图1

图2

图3

图4

表1

图5

表2

表3

表4

图6

图7

表5

图8

图9

表6

参考文献 0

	国家林业和草原局/国家公园管理局. 2017年中国林业发展报告.[2019-05-05]. http://www.forestry.gov.cn/main/62/content-1086586.html.
	National Forestry and Grassland Administration/National Park Administration. 2017 Forestry Development Report in China.[2019-05-05]. http://www.forestry.gov.cn/main/62/content-1086586.html. [in Chinese]
	斯幸峰, 丁平. 欧美陆地鸟类监测的历史、现状与我国的对策. 生物多样性, 2013. 19 (3): 303- 310.
	Si X F , Ding P . History, status of monitoring land birds in Europe and America and countermeasures of China. Biodiversity Science, 2013. 19 (3): 303- 310.
	约翰·马敬能,卡伦·菲力普斯,何芬奇,等. 2000.中国鸟类野外手册:中文版.长沙:湖南教育出版社.
	Mackinnon J, Phillipps K, He F Q, et al. 2000. A Field Guide to the Birds of China. Changsha: Hunan Education Publishing House.[in Chinese]
	Branson S, Van Horn G, Belongie S, et al. 2014. Bird species categorization using pose normalized deep convolutional nets. arXiv preprint, arXiv: 1406.2952.
	Canterbury G E , Martin T E , Petit D R , et al. Bird communities and habitat as ecological indicators of forest condition in regional monitoring. Conservation Biology, 2000. 14 (2): 544- 558. doi: 10.1046/j.1523-1739.2000.98235.x
	Cheng C , Fu Y W , Jiang Y G , et al. Dual skipping networks. Conference on Computer Vision and Pattern Recognition, IEEE, 2018. 4071- 4079.
	Cohen J . A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 1960. 20 (1): 37- 46. doi: 10.1177/001316446002000104
	Dubey A , Gupta O , Guo P , et al. Pairwise confusion for fine-grained visual classification. European Conference on Computer Vision, Springer, 2018. 70- 86.
	Farrell R , Oza O , Zhang N , et al. Birdlets:Subordinate categorization using volumetric primitives and pose-normalized appearance. International Conference on Computer Vision, IEEE, 2011. 161- 168.
	Gao Y , Mosalam K M . Deep transfer learning for image-based structural damage recognition. Computer-Aided Civil and Infrastructure Engineering, 2018. 33 (9): 748- 768. doi: 10.1111/mice.12363
	Huang G , Liu Z , Van Der Maaten L , et al. Densely connected convolutional networks. Conference on Computer Vision and Pattern Recognition, IEEE, 2017. 4700- 4708.
	Huang S , Xu Z , Tao D , et al. Part-stacked CNN for fine-grained visual categorization. Conference on Computer Vision and Pattern Recognition, IEEE, 2016. 1173- 1182.
	Ioffe S , Szegedy C . Batch normalization:Accelerating deep network training by reducing internal covariate shift. International Conference on Machine Learning, MIT Press, 2015. 448- 456.
	Keskar N S, Mudigere D, Nocedal J, et al. 2017. On large-batch training for deep learning: Generalization gap and sharp minima. International Conference on Learning Representations.arXiv preprint, arXiv: 1609.04836.
	Koskimies P . Birds as a tool in environmental monitoring. Annales Zoologici Fennici, 1989. 26 (3): 153- 166.
	Krause J , Sapp B , Howard A , et al. The unreasonable effectiveness of noisy data for fine-grained recognition. European Conference on Computer Vision, Springer, 2016. 301- 320.
	Lin T Y , RoyChowdhury A , Maji S . Bilinear convolutional neural networks for fine-grained visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018. 40 (6): 1309- 1322. doi: 10.1109/TPAMI.2017.2723400
	Long J , Shelhamer E , Darrell T . Fully convolutional networks for semantic segmentation. Conference on Computer Vision and Pattern Recognition, IEEE, 2015. 3431- 3440.
	Loshchilov I, Hutter F. 2017. SGDR: Stochastic gradient descent with warm restarts. The Fifth International Conference on Learning Representations. OpenReview.net. https://openreview.net/forum?id=skq89scxx.
	Lu Y , Yin J , Chen Z , et al. Revealing detail along the visual hierarchy:neural clustering preserves acuity from V1 to V4. Neuron, 2018. 98 (2): 417- 428. doi: 10.1016/j.neuron.2018.03.009
	Marini A , Turatti A J , Britto A S , et al. Visual and acoustic identification of bird species. International Conference on Acoustics, Speech and Signal Processing, IEEE, 2015. 2309- 2313.
	Martinez-Munoz G , Larios N , Mortensen E , et al. Dictionary-free categorization of very similar objects via stacked evidence trees. Conference on Computer Vision and Pattern Recognition, IEEE, 2009. 549- 556.
	Nadimpalli U D , Price R R , Hall S G , et al. A comparison of image processing techniques for bird recognition. Biotechnology Progress, 2006. 22 (1): 9- 13. doi: 10.1021/bp0500922
	Savard J P L , Clergeau P , Mennechez G . Biodiversity concepts and urban ecosystems. Landscape and Urban Planning, 2000. 48 (3/4): 131- 142.
	Sharif Razavian A , Azizpour H , Sullivan J , et al. CNN features off-the-shelf:an astounding baseline for recognition. Conference on Computer Vision and Pattern Recognition Workshops, IEEE, 2014. 806- 813.
	Szegedy C , Liu W , Jia Y , et al. Going deeper with convolutions. Conference on Computer Vision and Pattern Recognition, IEEE, 2015. 1- 9.
	Szegedy C , Vanhoucke V , Ioffe S , et al. Rethinking the inception architecture for computer vision. Conference on Computer Vision and Pattern Recognition, IEEE, 2016. 2818- 2826.
	Tan C , Sun F , Kong T , et al. A survey on deep transfer learning. International Conference on Artificial Neural Networks, Springer, 2018. 270- 279.
	Van Horn G , Branson S , Farrell R , et al. Building a bird recognition app and large scale dataset with citizen scientists:The fine print in fine-grained dataset collection. Conference on Computer Vision and Pattern Recognition, IEEE, 2015. 595- 604.
	Wei X S , Xie C W , Wu J , et al. Mask-CNN:Localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recognition, 2018. 76, 704- 714. doi: 10.1016/j.patcog.2017.10.002
	Welinder P, Branson S, Wah C, et al. 2010. The Caltech-UCSD Birds-200 dataset. California Institute of Technology. Technical Report CNS-TR2010-001-2010.
	Xie L , Tian Q , Zhang B . Feature normalization for part-based image classification. International Conference on Image Processing, IEEE, 2013. 2607- 2611.
	Yin C , Zhang L , Liu J . Pixel saliency based encoding for fine-grained image classification. Chinese Conference on Pattern Recognition and Computer Vision, Springer, 2018. 274- 285.
	Yosinski J , Clune J , Bengio Y , et al. How transferable are features in deep neural networks?. Advances in Neural Information Processing Systems, 2014. 3320- 3328.
	Zhang N , Donahue J , Girshick R , et al. Part-based R-CNNs for fine-grained category detection. European Conference on Computer Vision, Springer, 2014. 834- 849.
	Zhang N , Farrell R , Darrell T . Pose pooling kernels for sub-category recognition. Conference on Computer Vision and Pattern Recognition, IEEE, 2012. 3665- 3672.

部位名称 Part name	边界框 Bounding box	Top1正确率 Top1 accuracy(%)	Kappa值 Kappa value
鸟类头部 Bird head	Yes	95.06	0.95
鸟类头部 Bird head	No	94.30	0.94
鸟类躯干 Bird body	Yes	92.50	0.92
鸟类躯干 Bird body	No	91.58	0.91
鸟类全身 Bird	Yes	93.82	0.94
鸟类全身 Bird	No	93.46	0.93

特征融合方式 Feature fusion	Tep1正确率 Top1 accuracy(%)	Δ(%)
串联Concat	94.30	0.5
相加Add	94.80	0.5

模型 Model	特征融合方式 Feature fusion	Tep1正确率 Top1 accuracy(%)	Δ(%)
本文+DenseNet-121 Ours+DenseNet-121	串联Concat	93.46	1.24
本文+DenseNet-121 Ours+DenseNet-121	相加Add	94.70	1.24
本文+ DenseNet-169 Ours+DenseNet-169	串联Concat	93.30	1.6
本文+ DenseNet-169 Ours+DenseNet-169	相加Add	94.90	1.6

训练数据 Training data	总体正确率 Overall precision(%)
原图Original image	87.88
鸟类头部Bird head	91.40
鸟类躯干Bird body	87.40
鸟类全身Bird	94.50
本文Ours	94.80

模型 Model	参数量 Parameters/10⁶	Tep1正确率 Top1 accuracy(%)	Kappa值 Kappa value
Inception-V1	5.60	82.92	0.83
Inception-V2	10.16	83.38	0.83
Inception-V3	22.78	82.94	0.83
ResNet-50	23.53	82.64	0.82
ResNet-101	42.52	83.44	0.83
ResNet-152	58.35	84.04	0.84
Densenet-121	6.96	87.88	0.88
DenseNet-169	12.65	88.30	0.88
Bilinear-CNN	40.93	82.44	0.82
本文(鸟类头部) Ours(bird head)	14.11	94.80	0.95

融合全域与局域特征的深度卷积网络鸟类种群识别

Bird Species Identification Based on Deep Convolutional Network with Fusing Global and Local Features

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 15

参考文献 0

相关文章 1

编辑推荐

Metrics

本文评价

作者中心

期刊获奖

引证指标

联系方式

模型Model	Top1正确率Top1 accuracy(%)
Van Horn et al.	75.00
Bilinear CNN	79.40
Yin et al.	81.90
Dubey et al.	82.79
本文+ DenseNet-121 Ours+DenseNet-121	82.20
本文+ DenseNet-169 Ours+DenseNet-169	83.70