西北农业学报 Acta Agriculturae Boreali-Occidentalis Sinica

首页 | 编委会 | 期刊简介 | 投稿指南 | 审稿指南 | 联系我们 | 留言板 | 电子书

基于集成学习和近红外光谱的玉米种子含水率预测方法研究

引用本文：杨琳，张林，叶泽辉.基于集成学习和近红外光谱的玉米种子含水率预测方法研究[J].西北农业学报,2022,(8):1025~1034

DOI:10.7606/j.issn.1004-1389.2022.08.010

摘要点击次数: 623

全文下载次数: 385

作者	单位
杨琳，张林，叶泽辉	（1.商洛学院电子信息与电气工程学院，陕西商洛 726000；2.陕西省商丹高新学校，陕西商洛 726000）

基金项目:商洛学院科学研究项目（18SKY-FWDF001）。

中文摘要:针对传统化学方法测定玉米种子含水率存在工序复杂、周期长、成本高等问题，提出一种基于集成学习算法和近红外光谱技术的快速、无损预测玉米种子含水率的方法。以‘陕科9号’等8个品种的320份玉米种子作为研究对象，用近红外光谱仪（Antaris Ⅱ型，美国Nicolet公司）采集玉米种子的近红外漫反射光谱。统一采用偏最小二乘回归（Partial Least Squares Regression，PLS）方法对比分析SG平滑滤波（Savitzky-Golay，SG）结合4种光谱预处理方法对玉米种子近红外光谱的预处理效果，发现Savitzky-Golay方法结合多元散射校正法去噪效果最优。采用竞争性自适应重加权算法（CARS）进行特征波长的提取，前7个光谱特征的累计贡献率超92%以上。以GBDT（Gradient Boosting Decision Tree，梯度提升决策树）、RF（Random Forest，随机森林）、XGB（或XGBoost，Extreme Gradient Boosting极端梯度上升）作为基础模型，采用Stacking作为融合策略，建立Stacking集成学习模型。预处理后的数据，提取前7个主成分作为特征向量，用直接干燥法得到这些种子的含水率作为标签，分别训练4种玉米种子含水率预测模型，对比分析该4种模型的性能指标，Stacking集成模型经过2 163次训练后预测相关系数R_P=0.939 1，相对分析误差PRD=2.91。结果表明，Stacking集成模型融合了GDBT、RF、XGB 3个基础模型的优势，精度高，收敛特性好，泛化能力强，为玉米种子含水率快速、无损的测定提供了新的思路。

中文关键词:玉米种子近红外光谱集成学习 Stacking 含水率

Content in Maize Seeds Based on Ensemble Learning and Near Infrared Spectroscopy

Abstract:Traditional chemical methods for measuring moisture content in maize seeds are complex in process,long in time and high in cost etc.，a fast and nondestructive method for predicting moisture contents in maize seeds was proposed in this paper based on ensemble learning algorithm and near infrared spectroscopy. In this study,320 maize seeds of 8 varieties such as ‘Shannke 9’ were collected by near-infrared spectroscopy(Antaris Ⅱ,Nicolet,USA). Partial Least Squares Regression(PLS) method was used to compare and analyze the pretreatment effects of Savitzky-Golay(SG) and combination of four spectral pretreatment methods on NIR spectra of maize seeds. It is found that combination of Savitzky-Golay method with multiple scattering correction method has the best denoising effect. Competitive adaptive reweighting algorithm(CARS) was used to extract feature wavelengths,and the cumulative contribution rate of the first seven spectral features was over 92%. GBDT(Gradient Boosting Decision Tree),RF(Random Forest),XGB(or XGBoost,Extreme Gradient Boosting) was used as the basic model,and Stacking was used as a fusion strategy to build a stacking ensemble learning model. After pretreatment,the first 7 principal components were extracted as feature vectors,and the moisture content of these seeds was obtained by direct drying method as labels. Four prediction models for moisture content in maize seeds were trained respectively,and the performance indicators of the four models were compared and analyzed. After 2 163 times training,the prediction correlation coefficient R_P of the stacking ensemble model was 0.939 1,and the relative analysis error PRD was 2.91. The results showed that the stacking ensemble model ensemble the advantages of GDBT,RF and XGB models,with high precision,good convergence characteristics and strong generalization ability,thus providing a new idea for fast and nondestructive determination of moisture contents in maize seeds.

keywords:Maize seeds NIRs Ensemble learning Stacking Moisture content

查看全文查看/发表评论下载PDF阅读器

微信关注二维码