欢迎访问林业科学,今天是

林业科学 ›› 2026, Vol. 62 ›› Issue (2): 111-125.doi: 10.11707/j.1001-7488.LYKX20240779

• 研究论文 • 上一篇    

基于CARS-PLSR的油茶种仁油脂含量和脂肪酸成分的近红外光谱预测模型构建

钟慧奇1,2,柴静瑜1,王开良1,3,滕建华4,毕文玉5,王安妮1,6,林萍1,3,*()   

  1. 1. 中国林业科学研究院亚热带林业研究所 富阳 311400
    2. 南京林业大学研究生院 南京 210037
    3. 浙江省林木育种重点实验室 富阳 311400
    4. 浙江省金华市婺城区东方红林场 金华 321025
    5. 嵊州市林业技术服务中心 嵊州312400
    6. 东北林业大学林木遗传育种全国重点实验室 哈尔滨150040
  • 收稿日期:2024-12-18 修回日期:2025-11-05 出版日期:2026-02-25 发布日期:2026-03-04
  • 通讯作者: 林萍 E-mail:linping80@126.com
  • 基金资助:
    “十四五”国家重点研发计划课题(2022YFD2200401);浙江省林木新品种选育重大科技专项课题(2021C02070-2)

Construction of Near Infrared Spectroscopy Prediction Models Based on CARS-PLSR for Determining Oil Content and Fatty Acid Composition of Camellia oleifera Kernel

Huiqi Zhong1,2,Jingyu Chai1,Kailiang Wang1,3,Jianhua Teng4,Wenyu Bi5,Anni Wang1,6,Ping Lin1,3,*()   

  1. 1. Research Institute of Subtropical Forestry, Chinese Academy of Forestry Fuyang 311400
    2. Graduate School of Nanjing Forestry University Nanjing 210037
    3. Zhejiang Key Laboratory of Forest Genetics and Breeding Fuyang 311400
    4. Dongfanghong Forest Farm of Zhejiang Province Jinhua 321025
    5. Shengzhou Forestry Technology Service Center Shengzhou 312400
    6. State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University Harbin 150040
  • Received:2024-12-18 Revised:2025-11-05 Online:2026-02-25 Published:2026-03-04
  • Contact: Ping Lin E-mail:linping80@126.com

摘要:

目的: 开发一种低成本、无损、精准且可批量检测油茶种仁油脂含量和脂肪酸成分含量的方法,以提高油茶油脂性状的评估效率。方法: 以220份油茶无性系种仁为材料,采用索氏抽提法测定种仁油脂含量、气相色谱法测定油茶籽油脂肪酸成分含量,采集种仁在波长1 000~2 500 nm间的近红外光谱,应用9种方法对光谱数据进行预处理后,分别通过随机抽样法(RS)和光谱?理化值共生距离算法(SPXY)按4∶1将样本划分为校准集和预测集,运用竞争性自适应重加权算法(CARS)从光谱数据中选择与各油茶油脂性状显著相关的关键波长,并建立油茶种仁油脂含量和脂肪酸成分含量的偏最小二乘回归(PLSR)预测模型。结果: 油茶种仁油脂含量以及7种脂肪酸(棕榈酸、棕榈烯酸、硬脂酸、油酸、亚油酸、亚麻酸、顺-11-二十碳烯酸)含量的变化范围均符合或接近正态分布。2种方法划分样本集所建油脂含量预测模型均具有良好的精度和稳定性,采用RS法划分样本集下,以标准正态变换(SNV)预处理方法最优,选择14个关键波长,相对分析误差(RPD)为5.205 5,预测集决定系数($R_{\mathrm{p}}^2 $)和均方根误差(RMSEp)分别为0.965 1、1.854 8 g·(100 g)?1;采用SPXY算法划分样本集下,以SNV+一阶导数(FD)预处理方法最优,选择25个关键波长,RPD为3.417 0,预测集$R_{\mathrm{p}}^2 $、RMSEp分别为0.916 8、2.622 4 g·(100 g)?1。油酸、亚油酸和亚麻酸含量模型在RS法划分样本集下,分别使用二阶导数(SD)、SNV和连续统去除(CR)预处理方法最优,RPD分别为1.939 4、2.116 4和2.338 1,$R_{\mathrm{p}}^2 $分别为0.738 5、0.775 4和0.831 6,RMSEp分别为1.707 1%、1.370 2%和0.049 2%。结论: 本研究构建的油茶种仁油脂含量近红外光谱预测模型精度较高,稳定性好,可用于油茶种仁油脂含量的快速、批量无损检测,油酸、亚油酸、亚麻酸含量预测模型可用于样品不饱和脂肪酸含量的初步预测,为应用近红外光谱技术实现油茶油脂含量、脂肪酸含量等的快速检测提供了理论依据和基础。

关键词: 油茶种仁, 油脂含量, 脂肪酸, 近红外光谱, 预测模型

Abstract:

Objective: This study aims to develop a low-cost, non-destructive, accurate, and batch method for detecting the oil content and fatty acid composition of Camellia oleifera kernels, to improve the evaluation efficiency of the oil traits. Method: The oil content in kernels of 220 C. oleifera clones was determined by Soxhlet extraction, and the fatty acid composition was determined by gas chromatography, respectively. The near infrared spectra of the kernels in the wavelength range of 1000?2500 nm were collected. After preprocessing the spectral data using 9 methods, the samples were divided into calibration and prediction sets at a ratio of 4:1 by random sampling (RS) and sample set partitioning based on joint X-Y distance (SPXY), respectively. The competitive adaptive reweighted sampling (CARS) was used to select the key wavelengths that were significantly correlated with the oil traits of C. oleifera from the spectral data, and the partial least squares regression (PLSR) prediction models were established for determining the oil content and fatty acid composition of C. oleifera kernels. Result: The variation ranges of oil content and the content of seven fatty acids (C16:0, C16:1, C18:0, C18:1, C18:2, C18:3, C20:1) were in accordance with or close to normal distribution. The established models for predicting oil content had good accuracy and stability. With the RS samples dividing method, the pretreatment method of standard normal variate (SNV) was optimal. With 14 key wavelengths selected, a prediction model of oil content was established with the relative percent deviation (RPD) of 5.2055, prediction set determination coefficient ($R_{\mathrm{p}}^2 $) and root mean square error (RMSEp) of 0.965 1 and 1.854 8 g·(100 g)?1, respectively. With the SPXY samples dividing method, the optimal SNV + first derivative (FD) pretreatment, and 25 key wavelengths selected, another prediction model of oil content was established with a RPD of 3.417 0, prediction set $R_{\mathrm{p}}^2 $ and RMSEp of 0.916 8 and 2.622 4 g·(100 g)?1, respectively. The models for C18:1, C18:2 and C18:3 contents were optimal under the RS method using second derivative (SD), SNV and continuum removal (CR) pretreatment methods, respectively, with RPD values of 1.939 4, 2.116 4 and 2.338 1, $R_{\mathrm{p}}^2 $ values of 0.738 5, 0.775 4 and 0.831 6, and RMSEp values of 1.707 1%, 1.370 2% and 0.049 2%, respectively. Conclusion: The prediction model for oil content of C. oleifera kernels has been constructed based on near-infrared spectroscopy in this study. This model has high accuracy and good stability, and can be used for rapid, batch and non-destructive detection of oil content of C. oleifera kernels. The prediction models for C18:1, C18:2 and C18:3 contents can be used for preliminary prediction of unsaturated fatty acid. This study can provide scientific basis for rapid detection of oil content, fatty acid composition and other traits of C. oleifera by near-infrared spectroscopy technology.

Key words: Camellia oleifera kernel, oil content, fatty acid, near infrared spectroscopy, prediction model

中图分类号: