欢迎访问林业科学,今天是

林业科学 ›› 2018, Vol. 54 ›› Issue (1): 32-45.doi: 10.11707/j.1001-7488.20180104

• 论文与研究报告 • 上一篇    下一篇

基于随机森林算法和SRAP分子标记的桂花品种鉴定方法

邱帅1, 沈柏春1, 李婷婷2, 郭娟1, 王霁1, 孙丽娜1, 陈徐平1, 胡绍庆3   

  1. 1. 杭州市园林绿化股份有限公司 杭州 310020;
    2. 浙江省林业科学研究院 杭州 310023;
    3. 浙江理工大学 杭州 310018
  • 收稿日期:2017-04-11 修回日期:2017-08-16 出版日期:2018-01-25 发布日期:2018-03-01
  • 基金资助:
    浙江省农业新品种选育重大科技专项项目(2016C02056-12)。

A Method of Osmanthus fragrans Cultivars Identification Based on Random Forest Algorithm and SRAP Molecular Markers

Qiu Shuai1, Shen Baichun1, Li Tingting2, Guo Juan1, Wang Ji1, Sun Lina1, Chen Xuping1, Hu Shaoqing3   

  1. 1. Hangzhou Landscaping Incorporated Hangzhou 310020;
    2. Zhejiang Forestry Academy Hangzhou 310023;
    3. Zhejiang Sci-Tech University Hangzhou 310018
  • Received:2017-04-11 Revised:2017-08-16 Online:2018-01-25 Published:2018-03-01

摘要: [目的]为了解决桂花品种难以鉴定以及苗木生产和园林应用中品种混杂、以次充好和常规DNA指纹图谱无法很好地应用于品种鉴定的问题,提出一种基于随机森林算法和SRAP分子标记的桂花品种鉴定方法,以实现桂花品种简便、快速和准确的鉴定。[方法]以45个桂花品种或变异类型为材料,提取DNA,使用90对SRAP引物进行PCR扩增,以毛细管电泳技术采集扩增信息,筛选出多态性强、扩增结果稳定的引物,计算单对引物的多态信息含量(PIC)、带型数、有效带型数、分辨能力(D)、带型分布的卡方值(χ2)和无法区分的样品对数(x)。筛选出能够完全区分所有品种的引物对组合位点数据作为训练集,用于构建基于随机森林算法的分类模型,并根据模型的泛化能力和分类效果选择最优的分类模型。[结果]筛选出10对SRAP引物,平均PIC为0.26,平均带型数为33.9,平均有效带型数为26.6,平均D为0.97,平均χ2为21.07,平均x为28.2。构建了8个分类模型rf1-rf8,每个分类模型均含有2对SRAP引物。所有分类模型都能完全区分所有桂花品种,模型的袋外数据(OOB)误差估计在0.004 4~0.013 9之间,rf5和rf3泛化能力最强,rf8最弱。rf1分类效果最优,rf3、rf4、rf5和rf7其次,rf2、rf6和rf8最差。[结论]分类模型rf1、rf3、rf4、rf5和rf7的分类能力最佳,所用SRAP引物对分别为me1/em3+me9/em6、me4/em5+me9/em6、me4/em8+me9/em6、me6/em9+me9/em6和me5/em5+me9/em6。除引物对的分辨能力外,所选引物对之间的相关性也显著影响模型的分类能力,相关性越弱,模型的分类能力越强。本研究提出的基于随机森林算法和SRAP分子标记的桂花品种鉴定方法,能够实现桂花品种简便、快速、准确的鉴定,满足桂花苗木生产、推广应用和种质资源保护对于品种鉴定的要求。

关键词: 桂花, 品种鉴定, 分类模型, SRAP分子标记, 随机森林算法

Abstract: [Objective] To solve the problem that Osmanthus fragrans cultivars being hardly identified in nursery stock production and landscape application, this study proposed a classification method based on random forest algorithm and SRAP molecular markers, which can be used for easily, quickly and accurately identifying varieties.[Method] DNA of 45 O. fragrans cultivars were extracted, which were applied to PCR amplification, using 90 SRAP primer pairs. The fragments were examined by Capillary Electrophoresis to screen the primer pairs with high polymorphism level and steady amplification. The amplification data were used to calculate polymorphism information content (PIC), numbers of patterns, numbers of effective patterns, the discriminating power (D), chi-square value of patterns distribution (χ2), and pairs of indistinguishable samples (x). The locus data of combination of primer pairs that can discriminate all cultivars were used as training set for construction of classification modes based on random forest algorithm. The models with best classifying ability were selected depending on their generalization ability and classifying quality.[Result] A total of 10 SRAP primer pairs were selected, with mean PIC of 0.26, mean numbers of patterns of 33.9, mean numbers of effective patterns of 26.6, mean D of 0.97, mean χ2 of 21.07 and mean x of 28.2. Eight classification models were constructed using 8 combination of 2 prime pairs that can discriminate all cultivars (rf1-rf8). The OOB (out of bag) error rate of these models ranged from 0.004 4-0.013 9. Among of them, rf5 and rf3 had the strongest generalization ability, while rf8 had the weakest. And rf1 had the best classifying quality, rf3, rf4, rf5 and rf7 had better, while rf8 had the worst.[Conclusion] Classification models rf1, rf3, rf4, rf5 and rf7 have the strongest classifying ability, with the combination of SRAP primer pairs of me1/em3+me9/em6, me4/em5+me9/em6, me4/em8+me9/em6, me6/em9+me9/em6 and me5/em5+me9/em6, separately. The weaker correlation of selected primer pairs brings the stronger classifying ability of models. The method proposed in this study can be applied to identity O. fragrans cultivars quickly and accurately.

Key words: Osmanthus fragrans, cultivar identification, classification model, SRAP marker, random forest algorithm

中图分类号: