Simplified molecular input line entry system (SMILES) as an alternative for constructing quantitative structure-property relationships (QSPR)

Andrey A. Toropov, Alla P. Toropova, Dilya V. Mukhamedzhanova, Ivan Gutman

Research output: Contribution to journalArticle

39 Citations (Scopus)

Abstract

Flexible descriptors calculated with correlation weights of fragments in the SMILES notation of molecular systems have been used as a tool for modeling normal boiling points of acyclic carbonyl compounds. Four variants of the Optimization of Correlation Weights of SMILES Fragments (OCWSF) have been examined. The difference between them is in the number of symbols in the SMILES fragments. Thus, fragments involving one-, two-, three-, and four-symbols have been examined. Correlation weights for three calculable features of SMILES are used in the OCWSF scheme: number of oxygen atoms (NO), number of double bonds (NDB), and (NO - NDB +10). In order to take into account the hydrogen bond interactions, correlation weights of these three features have been included in the OCWSF scheme. The best OCWSF model is based on three-symbol fragments together with the mentioned three features of the SMILES notation. Its statistical characteristics are: n=100, r2=0.9795, s=5.35°C, F=4673 (training set); n=100, r 2=0.9764, s=5.38°C, F=4055 (test set).

Original languageEnglish
Pages (from-to)1545-1552
Number of pages8
JournalIndian Journal of Chemistry - Section A Inorganic, Physical, Theoretical and Analytical Chemistry
Volume44
Issue number8
Publication statusPublished - Aug 2005

Fingerprint

entry
fragments
Oxygen
Carbonyl compounds
Atoms
optimization
Boiling point
oxygen atoms
Hydrogen bonds
coding
carbonyl compounds
boiling
education
hydrogen bonds

ASJC Scopus subject areas

  • Chemistry(all)

Cite this

Simplified molecular input line entry system (SMILES) as an alternative for constructing quantitative structure-property relationships (QSPR). / Toropov, Andrey A.; Toropova, Alla P.; Mukhamedzhanova, Dilya V.; Gutman, Ivan.

In: Indian Journal of Chemistry - Section A Inorganic, Physical, Theoretical and Analytical Chemistry, Vol. 44, No. 8, 08.2005, p. 1545-1552.

Research output: Contribution to journalArticle

@article{a984c98b45f1403eb0b662aacad6dbf9,
title = "Simplified molecular input line entry system (SMILES) as an alternative for constructing quantitative structure-property relationships (QSPR)",
abstract = "Flexible descriptors calculated with correlation weights of fragments in the SMILES notation of molecular systems have been used as a tool for modeling normal boiling points of acyclic carbonyl compounds. Four variants of the Optimization of Correlation Weights of SMILES Fragments (OCWSF) have been examined. The difference between them is in the number of symbols in the SMILES fragments. Thus, fragments involving one-, two-, three-, and four-symbols have been examined. Correlation weights for three calculable features of SMILES are used in the OCWSF scheme: number of oxygen atoms (NO), number of double bonds (NDB), and (NO - NDB +10). In order to take into account the hydrogen bond interactions, correlation weights of these three features have been included in the OCWSF scheme. The best OCWSF model is based on three-symbol fragments together with the mentioned three features of the SMILES notation. Its statistical characteristics are: n=100, r2=0.9795, s=5.35°C, F=4673 (training set); n=100, r 2=0.9764, s=5.38°C, F=4055 (test set).",
author = "Toropov, {Andrey A.} and Toropova, {Alla P.} and Mukhamedzhanova, {Dilya V.} and Ivan Gutman",
year = "2005",
month = "8",
language = "English",
volume = "44",
pages = "1545--1552",
journal = "Indian Journal of Chemistry - Section A Inorganic, Physical, Theoretical and Analytical Chemistry",
issn = "0376-4710",
publisher = "Scientific Publishers of India",
number = "8",

}

TY - JOUR

T1 - Simplified molecular input line entry system (SMILES) as an alternative for constructing quantitative structure-property relationships (QSPR)

AU - Toropov, Andrey A.

AU - Toropova, Alla P.

AU - Mukhamedzhanova, Dilya V.

AU - Gutman, Ivan

PY - 2005/8

Y1 - 2005/8

N2 - Flexible descriptors calculated with correlation weights of fragments in the SMILES notation of molecular systems have been used as a tool for modeling normal boiling points of acyclic carbonyl compounds. Four variants of the Optimization of Correlation Weights of SMILES Fragments (OCWSF) have been examined. The difference between them is in the number of symbols in the SMILES fragments. Thus, fragments involving one-, two-, three-, and four-symbols have been examined. Correlation weights for three calculable features of SMILES are used in the OCWSF scheme: number of oxygen atoms (NO), number of double bonds (NDB), and (NO - NDB +10). In order to take into account the hydrogen bond interactions, correlation weights of these three features have been included in the OCWSF scheme. The best OCWSF model is based on three-symbol fragments together with the mentioned three features of the SMILES notation. Its statistical characteristics are: n=100, r2=0.9795, s=5.35°C, F=4673 (training set); n=100, r 2=0.9764, s=5.38°C, F=4055 (test set).

AB - Flexible descriptors calculated with correlation weights of fragments in the SMILES notation of molecular systems have been used as a tool for modeling normal boiling points of acyclic carbonyl compounds. Four variants of the Optimization of Correlation Weights of SMILES Fragments (OCWSF) have been examined. The difference between them is in the number of symbols in the SMILES fragments. Thus, fragments involving one-, two-, three-, and four-symbols have been examined. Correlation weights for three calculable features of SMILES are used in the OCWSF scheme: number of oxygen atoms (NO), number of double bonds (NDB), and (NO - NDB +10). In order to take into account the hydrogen bond interactions, correlation weights of these three features have been included in the OCWSF scheme. The best OCWSF model is based on three-symbol fragments together with the mentioned three features of the SMILES notation. Its statistical characteristics are: n=100, r2=0.9795, s=5.35°C, F=4673 (training set); n=100, r 2=0.9764, s=5.38°C, F=4055 (test set).

UR - http://www.scopus.com/inward/record.url?scp=28244474463&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=28244474463&partnerID=8YFLogxK

M3 - Article

VL - 44

SP - 1545

EP - 1552

JO - Indian Journal of Chemistry - Section A Inorganic, Physical, Theoretical and Analytical Chemistry

JF - Indian Journal of Chemistry - Section A Inorganic, Physical, Theoretical and Analytical Chemistry

SN - 0376-4710

IS - 8

ER -