TY - JOUR

T1 - Comparison of SMILES and molecular graphs as the representation of the molecular structure for QSAR analysis for mutagenic potential of polyaromatic amines

AU - Toropov, A. A.

AU - Toropova, A. P.

AU - Martyanov, S. E.

AU - Benfenati, E.

AU - Gini, G.

AU - Leszczynska, D.

AU - Leszczynski, J.

PY - 2011/11/15

Y1 - 2011/11/15

N2 - Optimal descriptors calculated with simplified molecular input line entry system (SMILES), hydrogen-suppressed molecular graph (HSG), hydrogen-filled molecular graph (HFG), and graph of atomic orbitals (GAO) have been studied as a basis to build up models for mutagenicity of polyaromatic amines. The optimal descriptors are calculated with correlation weights of the molecular fragments. In the case of the molecular graph, chemical elements (C, N, O, etc.) or their electronic structure (1s2, 2p3, 3d10, etc.) together with their Morgan vertex degrees are the basis for calculation of the descriptor. In the case of SMILES, chemical elements (C, O, N, etc.) together with presence of cycles (1, 2, 3, etc.), cis-, trans- isomerism ('\' and '/') and other are the basis for calculation of the descriptor. In both these cases, descriptors are a mathematical function of the correlation weights of the above-mentioned molecular features. The correlation weights are calculated by the Monte Carlo optimization (the target function is the correlation coefficient between experimental and predicted endpoint values). SMILES-based optimal descriptors have shown the preferable predictive ability. The CORAL software (http://www.insilico.eu/coral/) was used to build up models of the mutagenic potential as the function of the molecular structure. Analysis of three probes of the Monte Carlo optimization with six random splits has shown there are three kinds of the molecular features encoded by SMILES attributes: promoters of increase/decrease of mutagenic potential and ones without defined role.

AB - Optimal descriptors calculated with simplified molecular input line entry system (SMILES), hydrogen-suppressed molecular graph (HSG), hydrogen-filled molecular graph (HFG), and graph of atomic orbitals (GAO) have been studied as a basis to build up models for mutagenicity of polyaromatic amines. The optimal descriptors are calculated with correlation weights of the molecular fragments. In the case of the molecular graph, chemical elements (C, N, O, etc.) or their electronic structure (1s2, 2p3, 3d10, etc.) together with their Morgan vertex degrees are the basis for calculation of the descriptor. In the case of SMILES, chemical elements (C, O, N, etc.) together with presence of cycles (1, 2, 3, etc.), cis-, trans- isomerism ('\' and '/') and other are the basis for calculation of the descriptor. In both these cases, descriptors are a mathematical function of the correlation weights of the above-mentioned molecular features. The correlation weights are calculated by the Monte Carlo optimization (the target function is the correlation coefficient between experimental and predicted endpoint values). SMILES-based optimal descriptors have shown the preferable predictive ability. The CORAL software (http://www.insilico.eu/coral/) was used to build up models of the mutagenic potential as the function of the molecular structure. Analysis of three probes of the Monte Carlo optimization with six random splits has shown there are three kinds of the molecular features encoded by SMILES attributes: promoters of increase/decrease of mutagenic potential and ones without defined role.

KW - Monte Carlo method

KW - Mutagenicity

KW - Optimal descriptor

KW - QSAR

UR - http://www.scopus.com/inward/record.url?scp=84860404903&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84860404903&partnerID=8YFLogxK

U2 - 10.1016/j.chemolab.2011.07.008

DO - 10.1016/j.chemolab.2011.07.008

M3 - Article

AN - SCOPUS:84860404903

VL - 109

SP - 94

EP - 100

JO - Chemometrics and Intelligent Laboratory Systems

JF - Chemometrics and Intelligent Laboratory Systems

SN - 0169-7439

IS - 1

ER -