[1] MARTIN Y C, KOFRON J L, TRAPHAGEN L M. Do structurally similar molecules have similar biological activity? [J]. Journal of Medicinal Chemistry, 2002, 45(19): 4350-4358. doi: 10.1021/jm020155c
[2] PANDEY S, QU J X, STEVANOVIĆ V, et al. Predicting energy and stability of known and hypothetical crystals using graph neural network [J]. Patterns, 2021, 2(11): 100361. doi: 10.1016/j.patter.2021.100361
[3] WALTERS W P, BARZILAY R. Applications of deep learning in molecule generation and molecular property prediction [J]. Accounts of Chemical Research, 2021, 54(2): 263-270. doi: 10.1021/acs.accounts.0c00699
[4] HANSCH C, MALONEY P P, FUJITA T, et al. Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coefficients [J]. Nature, 1962, 194(4824): 178-180.
[5] CHERKASOV A, MURATOV E N, FOURCHES D, et al. QSAR modeling: Where have you been? Where are you going to? [J]. Journal of Medicinal Chemistry, 2014, 57(12): 4977-5010. doi: 10.1021/jm4004285
[6] ZHONG S F, HU J J, YU X, et al. Molecular image-convolutional neural network (CNN) assisted QSAR models for predicting contaminant reactivity toward OH radicals: Transfer learning, data augmentation and model interpretation [J]. Chemical Engineering Journal, 2021, 408: 127998. doi: 10.1016/j.cej.2020.127998
[7] GILMER J, SCHOENHOLZ S S, RILEY P F, et al. Neural message passing for Quantum chemistry[C]//Proceedings of the 34th International Conference on Machine Learning - Volume 70. August 6-11, 2017, Sydney, NSW, Australia. New York: ACM, 2017: 1263–1272.
[8] YANG K, SWANSON K, JIN W G, et al. Analyzing learned molecular representations for property prediction [J]. Journal of Chemical Information and Modeling, 2019, 59(8): 3370-3388. doi: 10.1021/acs.jcim.9b00237
[9] WEINREICH J, BROWNING N J, von LILIENFELD O A. Machine learning of free energies in chemical compound space using ensemble representations: Reaching experimental uncertainty for solvation [J]. The Journal of Chemical Physics, 2021, 154(13): 134113. doi: 10.1063/5.0041548
[10] ZHANG D D, XIA S, ZHANG Y K. Accurate prediction of aqueous free solvation energies using 3D atomic feature-based graph neural network with transfer learning [J]. Journal of Chemical Information and Modeling, 2022, 62(8): 1840-1848. doi: 10.1021/acs.jcim.2c00260
[11] RAZA A, BARDHAN S, XU L H, et al. A machine learning approach for predicting defluorination of per- and polyfluoroalkyl substances (PFAS) for their efficient treatment and removal [J]. Environmental Science & Technology Letters, 2019, 6(10): 624-629.
[12] WALLACH I, DZAMBA M, HEIFETS A. AtomNet: A deep convolutional neural network for bioactivity prediction in structure-based drug discovery [J]. Mathematische Zeitschrift, 2015, 47(1): 34-46.
[13] CHENG W X, NG C A. Using machine learning to classify bioactivity for 3486 per- and polyfluoroalkyl substances (PFASs) from the OECD list [J]. Environmental Science & Technology, 2019, 53(23): 13970-13980.
[14] BERTONI M, DURAN-FRIGOLA M, BADIA-I-MOMPEL P, et al. Bioactivity descriptors for uncharacterized chemical compounds [J]. Nature Communications, 2021, 12(1): 3932. doi: 10.1038/s41467-021-24150-4
[15] MUKHERJEE A, SU A, RAJAN K. Deep learning model for identifying critical structural motifs in potential endocrine disruptors [J]. Journal of Chemical Information and Modeling, 2021, 61(5): 2187-2197. doi: 10.1021/acs.jcim.0c01409
[16] SUN X F, ZHANG X M, MUIR D C G, et al. Identification of potential PBT/POP-like chemicals by a deep learning approach based on 2D structural features [J]. Environmental Science & Technology, 2020, 54(13): 8221-8231.
[17] WANG H B, WANG Z Y, CHEN J W, et al. Graph attention network model with defined applicability domains for screening PBT chemicals [J]. Environmental Science & Technology, 2022, 56(10): 6774-6785.
[18] KIM S, CHEN J, CHENG T J, et al. PubChem in 2021: New data content and improved web interfaces [J]. Nucleic Acids Research, 2021, 49(D1): D1388-D1395. doi: 10.1093/nar/gkaa971
[19] PENCE H E, WILLIAMS A. ChemSpider: An online chemical information resource [J]. Journal of Chemical Education, 2010, 87(11): 1123-1124. doi: 10.1021/ed100697w
[20] BLUM L C, REYMOND J L. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13 [J]. Journal of the American Chemical Society, 2009, 131(25): 8732-8733. doi: 10.1021/ja902302h
[21] RUDDIGKEIT L, van DEURSEN R, BLUM L C, et al. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17 [J]. Journal of Chemical Information and Modeling, 2012, 52(11): 2864-2875. doi: 10.1021/ci300415d
[22] MOBLEY D L, GUTHRIE J P. FreeSolv: A database of experimental and calculated hydration free energies, with input files [J]. Journal of Computer-Aided Molecular Design, 2014, 28(7): 711-720. doi: 10.1007/s10822-014-9747-x
[23] IRWIN J J, STERLING T, MYSINGER M M, et al. ZINC: A free tool to discover chemistry for biology [J]. Journal of Chemical Information and Modeling, 2012, 52(7): 1757-1768. doi: 10.1021/ci3001277
[24] IRWIN J J, TANG K G, YOUNG J, et al. ZINC20-a free ultralarge-scale chemical database for ligand discovery [J]. Journal of Chemical Information and Modeling, 2020, 60(12): 6065-6073. doi: 10.1021/acs.jcim.0c00675
[25] IRWIN J J, SHOICHET B K. ZINC: A free database of commercially available compounds for virtual screening [J]. Journal of Chemical Information and Modeling, 2005, 45(1): 177-182. doi: 10.1021/ci049714+
[26] BENTO A P, GAULTON A, HERSEY A, et al. The ChEMBL bioactivity database: An update [J]. Nucleic Acids Research, 2014, 42(D1): D1083-D1090. doi: 10.1093/nar/gkt1031
[27] WISHART D S, FEUNANG Y D, GUO A C, et al. DrugBank 5.0: A major update to the DrugBank database for 2018 [J]. Nucleic Acids Research, 2018, 46(D1): D1074-D1082. doi: 10.1093/nar/gkx1037
[28] DIX D J, HOUCK K A, MARTIN M T, et al. The ToxCast program for prioritizing toxicity testing of environmental chemicals [J]. Toxicological Sciences, 2007, 95(1): 5-12. doi: 10.1093/toxsci/kfl103
[29] LIU Z H, LI Y, HAN L, et al. PDB-wide collection of binding data: Current status of the PDBbind database [J]. Bioinformatics, 2015, 31(3): 405-412. doi: 10.1093/bioinformatics/btu626
[30] GILSON M K, LIU T Q, BAITALUK M, et al. BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology [J]. Nucleic Acids Research, 2016, 44(D1): D1045-D1053. doi: 10.1093/nar/gkv1072
[31] SHEN W X, ZENG X, ZHU F, et al. Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations [J]. Nature Machine Intelligence, 2021, 3(4): 334-343. doi: 10.1038/s42256-021-00301-6
[32] WANG L G, ZHAO L, LIU X, et al. SepPCNET: Deeping learning on a 3D surface electrostatic potential point cloud for enhanced toxicity classification and its application to suspected environmental estrogens [J]. Environmental Science & Technology, 2021, 55(14): 9958-9967.
[33] TODESCHINI R, CONSONNI V. Molecular Descriptors for Chemoinformatics[M]. Wiley-VCH, 2009.
[34] GRISONI F, CONSONNI V, TODESCHINI R. Impact of molecular descriptors on computational models [J]. Methods in Molecular Biology (Clifton, N. J. ), 2018, 1825: 171-209.
[35] 吴萍, 孔德信. 分子相似性与MOLPRINT 2D的本地化 [J]. 计算机与应用化学, 2008, 25(4): 505-508. doi: 10.3969/j.issn.1001-4160.2008.04.027 WU P, KONG D X. Molecular similarity and localization of MOLPRINT 2D [J]. Computers and Applied Chemistry, 2008, 25(4): 505-508(in Chinese). doi: 10.3969/j.issn.1001-4160.2008.04.027
[36] KHAN A U. Descriptors and their selection methods in QSAR analysis: Paradigm for drug design [J]. Drug Discovery Today, 2016, 21(8): 1291-1302. doi: 10.1016/j.drudis.2016.06.013
[37] CERETO-MASSAGUÉ A, OJEDA M J, VALLS C, et al. Molecular fingerprint similarity search in virtual screening [J]. Methods, 2015, 71: 58-63. doi: 10.1016/j.ymeth.2014.08.005
[38] DURANT J L, LELAND B A, HENRY D R, et al. Reoptimization of MDL keys for use in drug discovery [J]. Journal of Chemical Information and Computer Sciences, 2002, 42(6): 1273-1280. doi: 10.1021/ci010132r
[39] ROGERS D, HAHN M. Extended-connectivity fingerprints [J]. Journal of Chemical Information and Modeling, 2010, 50(5): 742-754. doi: 10.1021/ci100050t
[40] BENDER A, MUSSA H Y, GLEN R C, et al. Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): Evaluation of performance [J]. Journal of Chemical Information and Computer Sciences, 2004, 44(5): 1708-1718. doi: 10.1021/ci0498719
[41] MAURI A. alvaDesc: A Tool to Calculate and Analyze Molecular Descriptors and Fingerprints[M]// Ecotoxicological QSARs. New York: Humana, 2020: 801-820.
[42] O'BOYLE N M, BANCK M, JAMES C A, et al. Open Babel: An open chemical toolbox [J]. Journal of Cheminformatics, 2011, 3: 33. doi: 10.1186/1758-2946-3-33
[43] STEINBECK C, HAN Y Q, KUHN S, et al. The Chemistry Development Kit (CDK): An open-source Java library for Chemo- and Bioinformatics [J]. Journal of Chemical Information and Computer Sciences, 2003, 43(2): 493-500. doi: 10.1021/ci025584y
[44] XIONG Z P, WANG D Y, LIU X H, et al. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism [J]. Journal of Medicinal Chemistry, 2020, 63(16): 8749-8760. doi: 10.1021/acs.jmedchem.9b00959
[45] WEININGER D. SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules [J]. Journal of Chemical Information & Computer Sciences, 1988, 28(1): 31-36.
[46] SANCHES-NETO F O, DIAS-SILVA J R, KENG QUEIROZ L H Jr, et al. “pySiRC”: Machine learning combined with molecular fingerprints to predict the reaction rate constant of the radical-based oxidation processes of aqueous organic contaminants [J]. Environmental Science & Technology, 2021, 55(18): 12437-12448.
[47] ZHONG S F, ZHANG K, WANG D, et al. Shedding light on “Black Box” machine learning models for predicting the reactivity of HO radicals toward organic compounds [J]. Chemical Engineering Journal, 2021, 405: 126627. doi: 10.1016/j.cej.2020.126627
[48] ZHONG S F, HU J J, FAN X D, et al. A deep neural network combined with molecular fingerprints (DNN-MF) to develop predictive models for hydroxyl radical rate constants of water contaminants [J]. Journal of Hazardous Materials, 2020, 383: 121141. doi: 10.1016/j.jhazmat.2019.121141
[49] HELLER S R, McNAUGHT A, PLETNEV I, et al. InChI, the IUPAC international chemical identifier [J]. Journal of Cheminformatics, 2015, 7: 23. doi: 10.1186/s13321-015-0068-4
[50] GOH G B, SIEGEL C, VISHNU A, et al. Chemception: A deep neural network with minimal chemistry knowledge matches the performance of expert-developed QSAR/QSPR models[J]. ArXiv, 2017, abs/1706.06689
[51] GOH G B, SIEGEL C, VISHNU A, et al. ChemNet: A transferable and generalizable deep neural network for small-molecule property prediction[J]. ArXiv, 2017, abs/1712.02734
[52] KORKMAZ S. Deep learning-based imbalanced data classification for drug discovery [J]. Journal of Chemical Information and Modeling, 2020, 60(9): 4180-4190. doi: 10.1021/acs.jcim.9b01162
[53] WU Z Q, RAMSUNDAR B, FEINBERG E N, et al. MoleculeNet: A benchmark for molecular machine learning [J]. Chemical Science, 2017, 9(2): 513-530.
[54] 徐玲玲, 迟冬祥. 面向不平衡数据集的机器学习分类策略 [J]. 计算机工程与应用, 2020, 56(24): 12-27. doi: 10.3778/j.issn.1002-8331.2007-0120 XU L L, CHI D X. Machine learning classification strategy for imbalanced data sets [J]. Computer Engineering and Applications, 2020, 56(24): 12-27(in Chinese). doi: 10.3778/j.issn.1002-8331.2007-0120
[55] NOBLE W S. What is a support vector machine? [J]. Nature Biotechnology, 2006, 24(12): 1565-1567. doi: 10.1038/nbt1206-1565
[56] QUINLAN J R. Induction of decision trees [J]. Machine Learning, 1986, 1(1): 81-106.
[57] BREIMAN L. Random forests [J]. Machine Learning, 2001, 45(1): 5-32. doi: 10.1023/A:1010933404324
[58] SUTTON R S, BARTO A G. Reinforcement learning: An introduction [J]. IEEE Transactions on Neural Networks, 1998, 9(5): 1054.
[59] MITCHELL J B O. Machine learning methods in chemoinformatics [J]. Wiley Interdisciplinary Reviews. Computational Molecular Science, 2014, 4(5): 468-481. doi: 10.1002/wcms.1183
[60] GRAMATICA P. Principles of QSAR models validation: Internal and external [J]. QSAR & Combinatorial Science, 2007, 26(5): 694-701.
[61] KAR S, ROY K, LESZCZYNSKI J. Applicability Domain: A Step Toward Confident Predictions and Decidability for QSAR Modeling[M]// Computational Toxicology, Part of the Methods in Molecular Biology book series. New York: Humana Press, 2018: 141-169.
[62] 王中钰, 陈景文, 傅志强, 等. QSAR模型应用域的表征方法 [J]. 科学通报, 2022, 67(3): 255-266. doi: 10.1360/TB-2021-0406 WANG Z Y, CHEN J W, FU Z Q, et al. Characterization of applicability domains for QSAR models [J]. Chinese Science Bulletin, 2022, 67(3): 255-266(in Chinese). doi: 10.1360/TB-2021-0406
[63] WANG Z Y, CHEN J W, HONG H X. Developing QSAR models with defined applicability domains on PPARγ binding affinity using large data sets and machine learning algorithms [J]. Environmental Science & Technology, 2021, 55(10): 6857-6866.
[64] BERENGER F, YAMANISHI Y. A distance-based Boolean applicability domain for classification of high throughput screening data [J]. Journal of Chemical Information and Modeling, 2019, 59(1): 463-476. doi: 10.1021/acs.jcim.8b00499
[65] 郑玉婷, 乔显亮, 于洋, 等. 有机化学品生物富集因子定量结构-活性关系模型 [J]. 生态毒理学报, 2019, 14(2): 214-221. doi: 10.7524/AJE.1673-5897.20180718002 ZHENG Y T, QIAO X L, YU Y, et al. Quantitative structure-activity relationship model for bioconcentration factors of organic chemicals [J]. Asian Journal of Ecotoxicology, 2019, 14(2): 214-221(in Chinese). doi: 10.7524/AJE.1673-5897.20180718002
[66] 杨真真, 匡楠, 范露, 等. 基于卷积神经网络的图像分类算法综述 [J]. 信号处理, 2018, 34(12): 1474-1489. doi: 10.16798/j.issn.1003-0530.2018.12.009 YANG Z Z, KUANG N, FAN L, et al. Review of image classification algorithms based on convolutional neural networks [J]. Journal of Signal Processing, 2018, 34(12): 1474-1489(in Chinese). doi: 10.16798/j.issn.1003-0530.2018.12.009
[67] RIBEIRO M T, SINGH S, GUESTRIN C. “why should I trust You?”: Explaining the predictions of any classifier[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. August 13-17, 2016, San Francisco, California, USA. New York: ACM, 2016: 1135-1144.
[68] BARREDO ARRIETA A, DÍAZ-RODRÍGUEZ N, del SER J, et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI [J]. Information Fusion, 2020, 58: 82-115. doi: 10.1016/j.inffus.2019.12.012
[69] PETCH J, DI S, NELSON W. Opening the black box: The promise and limitations of explainable machine learning in cardiology [J]. Canadian Journal of Cardiology, 2022, 38(2): 204-213. doi: 10.1016/j.cjca.2021.09.004
[70] ANDREWS R, DIEDERICH J, TICKLE A B. Survey and critique of techniques for extracting rules from trained artificial neural networks [J]. Knowledge-Based Systems, 1995, 8(6): 373-389. doi: 10.1016/0950-7051(96)81920-4
[71] LIU X, WANG X G, MATWIN S. Improving the interpretability of deep neural networks with knowledge distillation[C]//2018 IEEE International Conference on Data Mining Workshops (ICDMW). November 17-20, 2018, Singapore. IEEE, 2019: 905-912.
[72] MASHAYEKHI M, GRAS R. Rule extraction from decision trees ensembles: New algorithms based on heuristic search and sparse group lasso methods [J]. International Journal of Information Technology & Decision Making, 2017, 16(6): 1707-1727.
[73] GOLDSTEIN A, KAPELNER A, BLEICH J, et al. Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation [J]. Journal of Computational and Graphical Statistics, 2015, 24(1): 44-65. doi: 10.1080/10618600.2014.907095
[74] RIBEIRO M T, SINGH S, GUESTRIN C. Anchors: High-precision model-agnostic explanations[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018: 1527-1535.
[75] BACH S, BINDER A, MONTAVON G, et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation [J]. PLoS One, 2015, 10(7): e0130140. doi: 10.1371/journal.pone.0130140
[76] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization[C]// 2017 IEEE International Conference On Computer Vision (ICCV). 2017: 618-626.
[77] LUNDBERG S M, LEE S I. A unified approach to interpreting model predictions[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. December 4-9, 2017, Long Beach, California, USA. New York: ACM, 2017: 4768-4777.
[78] HASEBE T. Knowledge-embedded message-passing neural networks: Improving molecular property prediction with human knowledge [J]. ACS Omega, 2021, 6(42): 27955-27967. doi: 10.1021/acsomega.1c03839
[79] MATOS G D R, KYU D Y, LOEFFLER H H, et al. Approaches for calculating solvation free energies and enthalpies demonstrated with an update of the FreeSolv database [J]. Journal of Chemical and Engineering Data, 2017, 62(5): 1559-1569. doi: 10.1021/acs.jced.7b00104
[80] VERMEIRE F H, GREEN W H. Transfer learning for solvation free energies: From quantum chemistry to experiments [J]. Chemical Engineering Journal, 2021, 418: 129307. doi: 10.1016/j.cej.2021.129307
[81] SU A, RAJAN K. A database framework for rapid screening of structure-function relationships in PFAS chemistry [J]. Scientific Data, 2021, 8(1): 1-10. doi: 10.1038/s41597-020-00786-7
[82] MA J S, SHERIDAN R P, LIAW A, et al. Deep neural nets as a method for quantitative structure-activity relationships [J]. Journal of Chemical Information and Modeling, 2015, 55(2): 263-274. doi: 10.1021/ci500747n
[83] MAYR A, KLAMBAUER G, UNTERTHINER T, et al. DeepTox: Toxicity prediction using deep learning [J]. Frontiers in Environmental Science, 2016, 3: 80.
[84] PU L M, NADERI M, LIU T R, et al. eToxPred: A machine learning-based approach to estimate the toxicity of drug candidates [J]. BMC Pharmacology and Toxicology, 2019, 20(1): 2. doi: 10.1186/s40360-018-0282-6
[85] ALVES V, MURATOV E, CAPUZZI S, et al. Alarms about structural alerts [J]. Green Chemistry, 2016, 18(16): 4348-4360. doi: 10.1039/C6GC01492E
[86] WU X D, KUMAR V, QUINLAN J R, et al. Top 10 algorithms in data mining [J]. Knowledge and Information Systems, 2008, 14(1): 1-37. doi: 10.1007/s10115-007-0114-2
[87] JIMÉNEZ-LUNA J, GRISONI F, SCHNEIDER G. Drug discovery with explainable artificial intelligence [J]. Nature Machine Intelligence, 2020, 2(10): 573-584. doi: 10.1038/s42256-020-00236-4
[88] CHEN D, GAO K F, NGUYEN D D, et al. Algebraic graph-assisted bidirectional transformers for molecular property prediction [J]. Nature Communications, 2021, 12(1): 1-9. doi: 10.1038/s41467-020-20314-w
[89] RODRÍGUEZ-PÉREZ R, BAJORATH J. Explainable machine learning for property predictions in compound optimization [J]. Journal of Medicinal Chemistry, 2021, 64(24): 17744-17752. doi: 10.1021/acs.jmedchem.1c01789