有机化学品鱼体生物积累参数的多任务神经网络预测模型构建
Multi-task Neutral Network Models for Simultaneous Prediction of Bioaccumulation Parameters of Organic Chemicals in Fish
-
摘要: 获取化学品的生物积累性数据是评价其生态及健康风险的前提。基于机器学习算法的模型已被用于生物积累性预测,填补相关数据空缺。但已有预测模型仅针对单一终点,忽略了不同终点间的内在联系。基于多任务学习算法的模型,有望实现多个生物积累参数的同时预测。本研究采用反向传播(back-propagation, BP)神经网络机器学习算法,基于分子Dragon描述符和4种分子指纹,建立了可同时预测化学品鱼体生物富集因子(BCF)和生物放大因子(BMF)的多任务模型,并与单任务模型进行了比较。结果表明,多任务模型的拟合效果、稳健性和预测能力均优于单任务模型。采用Dragon描述符作为输入的多任务模型表现最好,其训练集的决定系数(R2)、均方根误差(RMSE)和平均绝对误差(MAE)分别为0.925~0.964、0.168~0.247和0.133;验证集的R2、RMSE和MAE分别为0.771~0.894、0.176~0.213和0.168~0.176;10折交叉验证系数(Q2cv)为0.785~0.867。基于验证集与训练集分子间的谷本相似度表征了模型应用域。本研究所建模型可有效填补化学品生物积累性数据,为化学品生物积累性及风险评价提供技术支持。Abstract: Acquisition of bioaccumulation parameters is a prerequisite for assessing the ecological and health risks of chemicals. Machine learning based models have been developed for bioaccumulation assessment to fill the data gap. However, current prediction models on bioaccumulation parameters are mostly single-task models, neglecting the inherent correlations among different endpoints. Multi-task learning based models are promising for simultaneous prediction of multiple bioaccumulation parameters. In this study, multi-task models were developed using the back-propagation (BP) neural networks algorithm based on Dragon descriptors and four kinds of molecular fingerprints, to simultaneously predict bioconcentration factors (BCF) and biomagnification factors (BMF) of chemicals in fish. The predicted BCF and BMF from these models were compared with those from corresponding single-task models. Results showed that the multi-task models outperformed the single-task models in goodness-of-fit, robustness, and predictability. The best multi-task model was obtained using Dragon descriptors as the input, with determination coefficients (R2), root mean square errors (RMSE) and mean absolute errors (MAE) being 0.925~0.964, 0.168~0.247, and 0.133 for the training set, and 0.771~0.894, 0.176~0.213, and 0.168~0.176 for the validation set, respectively. The 10-fold cross validation coefficients (Q2cv) of the best model are 0.785~0.867. The application domains of the models were characterized by Tanimoto similarity between compounds from the training and the validation sets. The developed models in this study could provide data for bioaccumulation of chemicals and support chemical risk assessment.
-
-
Chen D, Kannan K, Tan H L, et al. Bisphenol analogues other than BPA: Environmental occurrence, human exposure, and toxicity—A review[J]. Environmental Science & Technology, 2016, 50(11): 5438-5453 Liu R Z, Mabury S A. Synthetic phenolic antioxidants in personal care products in Toronto, Canada: Occurrence, human exposure, and discharge via greywater[J]. Environmental Science & Technology, 2019, 53(22): 13440-13448 Provencher J F, Malaisé F, Mallory M L, et al. 44-year retrospective analysis of ultraviolet absorbents and industrial antioxidants in seabird eggs from the Canadian Arctic (1975 to 2019)[J]. Environmental Science & Technology, 2022, 56(20): 14562-14573 Li Y N, Yao J Z, Zhang J, et al. First report on the bioaccumulation and trophic transfer of perfluoroalkyl ether carboxylic acids in estuarine food web[J]. Environmental Science & Technology, 2022, 56(10): 6046-6055 Gaballah S, Swank A, Sobus J R, et al. Evaluation of developmental toxicity, developmental neurotoxicity, and tissue dose in zebrafish exposed to GenX and other PFAS[J]. Environmental Health Perspectives, 2020, 128(4): 47005 Zhang T T, Zhou X, Xu A M, et al. Toxicity of polybrominated diphenyl ethers (PBDEs) on rodent male reproductive system: A systematic review and meta-analysis of randomized control studies[J]. The Science of the Total Environment, 2020, 720: 137419 Li F, Li X H, Shao J P, et al. Estrogenic activity of anthraquinone derivatives: in vitro and in silico studies[J]. Chemical Research in Toxicology, 2010, 23(8): 1349-1355 Luo T L, Chen J W, Song B, et al. Time-gated luminescence imaging of singlet oxygen photoinduced by fluoroquinolones and functionalized graphenes in Daphnia magna[J]. Aquatic Toxicology, 2017, 191: 105-112 Rockström J, Steffen W, Noone K, et al. A safe operating space for humanity[J]. Nature, 2009, 461(7263): 472-475 陈景文, 全燮. 环境化学[M]. 大连: 大连理工大学出版社, 2009: 170-176 European Union. Regulation (EC) No. 1907/2006 of the European Parliament and of the Council of 18 December 2006, concerning the Registration, Evaluation, Authorization, and Restriction of Chemicals (REACH)[R]. Brussels: European Union, 2006 中华人民共和国生态环境部. 新化学物质环境管理登记指南[R]. 北京: 中华人民共和国生态环境部, 2020 Organisation for Economic Co-operation and Development (OECD). OECD guidelines for the testing of chemicals, Test No. 305: Bioaccumulation in fish: Aqueous and dietary exposure[R]. Paris: OECD, 2012 Zhao C Y, Boriani E, Chana A, et al. A new hybrid system of QSAR models for predicting bioconcentration factors (BCF)[J]. Chemosphere, 2008, 73(11): 1701-1707 Dearden J C, Hewitt M. QSAR modelling of bioconcentration factor using hydrophobicity, hydrogen bonding and topological descriptors[J]. SAR and QSAR in Environmental Research, 2010, 21(7-8): 671-680 Strempel S, Nendza M, Scheringer M, et al. Using conditional inference trees and random forests to predict the bioaccumulation potential of organic chemicals[J]. Environmental Toxicology and Chemistry, 2013, 32(5): 1187-1195 郑玉婷, 乔显亮, 于洋, 等. 有机化学品生物富集因子定量结构-活性关系模型[J]. 生态毒理学报, 2019, 14(2): 214-221 Zheng Y T, Qiao X L, Yu Y, et al. Quantitative structure-activity relationship model for bioconcentration factors of organic chemicals[J]. Asian Journal of Ecotoxicology, 2019, 14(2): 214-221(in Chinese)
丁蕊, 陈景文, 于洋, 等. 基于集成学习算法构建有机化学品鱼体生物富集因子的QSAR预测模型[J]. 环境化学, 2021, 40(5): 1295-1304 Ding R, Chen J W, Yu Y, et al. Using ensemble learning algorithms to develop QSAR models on bioconcentration factors of organic chemicals in multispecies fish[J]. Environmental Chemistry, 2021, 40(5): 1295-1304(in Chinese)
Fatemi M H, Abraham M H, Haghdadi M. Prediction of biomagnification factors for some organochlorine compounds using linear free energy relationship parameters and artificial neural networks[J]. SAR and QSAR in Environmental Research, 2009, 20(5-6): 453-465 Caruana R. Multitask learning[J]. Machine Learning, 1997, 28(1): 41-75 Muratov E N, Bajorath J, Sheridan R P, et al. QSAR without borders[J]. Chemical Society Reviews, 2020, 49(11): 3525-3564 Wu Z X, Jiang D J, Wang J K, et al. Mining toxicity information from large amounts of toxicity data[J]. Journal of Medicinal Chemistry, 2021, 64(10): 6924-6936 Zhang Y, Yang Q. An overview of multi-task learning[J]. National Science Review, 2018, 5(1): 30-43 Wu K D, Zhao Z X, Wang R X, et al. TopP-S: Persistent homology-based multi-task deep neural networks for simultaneous predictions of partition coefficient and aqueous solubility[J]. Journal of Computational Chemistry, 2018, 39(20): 1444-1454 Wu K D, Wei G W. Quantitative toxicity prediction using topology based multitask deep neural networks[J]. Journal of Chemical Information and Modeling, 2018, 58(2): 520-531 Arnot J, Gobas F. A review of bioconcentration factor (BCF) and bioaccumulation factor (BAF) assessments for organic chemicals in aquatic organisms[J]. Environmental Reviews, 2006, 14(4): 257-297 Arnot J A, Quinn C L. Development and evaluation of a database of dietary bioaccumulation test data for organic chemicals in fish[J]. Environmental Science & Technology, 2015, 49(8): 4783-4796 Grisoni F, Consonni V, Vighi M. Acceptable-by-design QSARs to predict the dietary biomagnification of organic chemicals in fish[J]. Integrated Environmental Assessment and Management, 2019, 15(1): 51-63 Mansouri K, Consonni V, Durjava M K, et al. Assessing bioaccumulation of polybrominated diphenyl ethers for aquatic species by QSAR modeling[J]. Chemosphere, 2012, 89(4): 433-444 Talete S R L. DRAGON (Software for Molecular Descriptor Calculation) Version 6.0[CP].Italy: TALETE SRL, 2012 Bikesh K, Kesari V, S Thoke A. Investigations on impact of feature normalization techniques on classifier's performance in breast tumor classification[J]. International Journal of Computer Applications, 2015, 116(19): 11-15 Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors[J]. Nature, 1986, 323(6088): 533-536 覃礼堂, 刘树深, 肖乾芬, 等. QSAR模型内部和外部验证方法综述[J]. 环境化学, 2013, 32(7): 1205-1211 Qin L T, Liu S S, Xiao Q F, et al. Internal and external validations of QSAR model: Review[J]. Environmental Chemistry, 2013, 32(7): 1205-1211(in Chinese)
Wang Z Y, Chen J W, Hong H X. Applicability domains enhance application of PPARγ agonist classifiers trained by drug-like compounds to environmental chemicals[J]. Chemical Research in Toxicology, 2020, 33(6): 1382-1388 Wang Z Y, Chen J W, Hong H X. Developing QSAR models with defined applicability domains on PPARγ binding affinity using large data sets and machine learning algorithms[J]. Environmental Science & Technology, 2021, 55(10): 6857-6866 Wang H T, Xia X H, Wang Z X, et al. Contribution of dietary uptake to PAH bioaccumulation in a simplified pelagic food chain: Modeling the influences of continuous vs intermittent feeding in zooplankton and fish[J]. Environmental Science & Technology, 2021, 55(3): 1930-1940 Wang H T, Xia X H, Liu R, et al. Multicompartmental toxicokinetic modeling of discrete dietary and continuous waterborne uptake of two polycyclic aromatic hydrocarbons by zebrafish Danio rerio[J]. Environmental Science & Technology, 2020, 54(2): 1054-1065 -

计量
- 文章访问数: 1672
- HTML全文浏览数: 1672
- PDF下载数: 108
- 施引文献: 0