Optimal descriptors calculated with simplified molecular input line entry system (SMILES), hydrogen-suppressed molecular graph (HSG), hydrogen-filled molecular graph (HFG), and graph of atomic orbitals (GAO) have been studied as a basis to build up models for mutagenicity of polyaromatic amines. The optimal descriptors are calculated with correlation weights of the molecular fragments. In the case of the molecular graph, chemical elements (C, N, O, etc.) or their electronic structure (1s2, 2p3, 3d10, etc.) together with their Morgan vertex degrees are the basis for calculation of the descriptor. In the case of SMILES, chemical elements (C, O, N, etc.) together with presence of cycles (1, 2, 3, etc.), cis-, trans- isomerism ('\' and '/') and other are the basis for calculation of the descriptor. In both these cases, descriptors are a mathematical function of the correlation weights of the above-mentioned molecular features. The correlation weights are calculated by the Monte Carlo optimization (the target function is the correlation coefficient between experimental and predicted endpoint values). SMILES-based optimal descriptors have shown the preferable predictive ability. The CORAL software (http://www.insilico.eu/coral/) was used to build up models of the mutagenic potential as the function of the molecular structure. Analysis of three probes of the Monte Carlo optimization with six random splits has shown there are three kinds of the molecular features encoded by SMILES attributes: promoters of increase/decrease of mutagenic potential and ones without defined role.
- Monte Carlo method
- Optimal descriptor
ASJC Scopus subject areas
- Analytical Chemistry
- Computer Science Applications
- Process Chemistry and Technology