分类: 物理学 >> 核物理学 提交时间: 2025-01-12
摘要: Missing values in radionuclide diffusion datasets can undermine the predictive accuracy and robustness of machine learning (ML) models. A regression-based missing data imputation method using light gradient boosting machine (LGBM) algorithm was employed to impute over 60% of the missing data, establishing a radionuclide diffusion dataset containing 16 input features and 813 instances. The effective diffusion coefficient (De) was predicted using ten ML models. The predictive accuracy of ensemble meta-models, namely LGBM-extreme gradient boosting (XGB) and LGBM-categorical boosting (CatB), surpassed the other ML models, with R2 values of 0.94. The models were applied in predicting the De values of EuEDTA- and HCrO4- in saturated compacted bentonites at compaction ranged from 1200 kg/m3 to 1800 kg/m3, which was measured using a through-diffusion method. The generalization ability of LGBM-XGB model surpassed that of LGB-CatB in predicting the De of HCrO4-. Shapley additive explanations identified the total porosity as the most significant influencing factor. In addition, the partial dependence plot analysis technique showed clearer results for univariate correlation analysis. This study provides a regression imputation technique to refine radionuclide diffusion datasets, offering a deeper insight into analyzing the diffusion mechanism of radionuclide and supporting the safety assessment of the geological disposal of high-level radioactive waste.
分类: 物理学 >> 核物理学 提交时间: 2024-12-03
摘要: Missing values in radionuclide diffusion datasets can undermine the predictive accuracy and robustness of machine learning (ML) models. A regression-based missing data imputation method using light gradient boosting machine (LGBM) algorithm was employed to impute over 60% of the missing data, establishing a radionuclide diffusion dataset containing 16 input features and 813 instances. The effective diffusion coefficient (De) was predicted using ten ML models. The predictive accuracy of ensemble meta-models, namely LGBM-extreme gradient boosting (XGB) and LGBM-categorical boosting (CatB), surpassed the other ML models, with R2 values of 0.94. The models were applied in predicting the De values of EuEDTA- and HCrO4- in saturated compacted bentonites at compaction ranged from 1200 kg/m3 to 1800 kg/m3, which was measured using a through-diffusion method. The generalization ability of LGBM-XGB model surpassed that of LGB-CatB in predicting the De of HCrO4-. Shapley additive explanations identified the total porosity as the most significant influencing factor. In addition, the partial dependence plot analysis technique showed clearer results for univariate correlation analysis. This study provides a regression imputation technique to refine radionuclide diffusion datasets, offering a deeper insight into analyzing the diffusion mechanism of radionuclide and supporting the safety assessment of the geological disposal of high-level radioactive waste.