TY - JOUR

T1 - Simplified molecular input line entry system (SMILES) as an alternative for constructing quantitative structure-property relationships (QSPR)

AU - Toropov, Andrey A.

AU - Toropova, Alla P.

AU - Mukhamedzhanova, Dilya V.

AU - Gutman, Ivan

PY - 2005/8

Y1 - 2005/8

N2 - Flexible descriptors calculated with correlation weights of fragments in the SMILES notation of molecular systems have been used as a tool for modeling normal boiling points of acyclic carbonyl compounds. Four variants of the Optimization of Correlation Weights of SMILES Fragments (OCWSF) have been examined. The difference between them is in the number of symbols in the SMILES fragments. Thus, fragments involving one-, two-, three-, and four-symbols have been examined. Correlation weights for three calculable features of SMILES are used in the OCWSF scheme: number of oxygen atoms (NO), number of double bonds (NDB), and (NO - NDB +10). In order to take into account the hydrogen bond interactions, correlation weights of these three features have been included in the OCWSF scheme. The best OCWSF model is based on three-symbol fragments together with the mentioned three features of the SMILES notation. Its statistical characteristics are: n=100, r2=0.9795, s=5.35°C, F=4673 (training set); n=100, r 2=0.9764, s=5.38°C, F=4055 (test set).

AB - Flexible descriptors calculated with correlation weights of fragments in the SMILES notation of molecular systems have been used as a tool for modeling normal boiling points of acyclic carbonyl compounds. Four variants of the Optimization of Correlation Weights of SMILES Fragments (OCWSF) have been examined. The difference between them is in the number of symbols in the SMILES fragments. Thus, fragments involving one-, two-, three-, and four-symbols have been examined. Correlation weights for three calculable features of SMILES are used in the OCWSF scheme: number of oxygen atoms (NO), number of double bonds (NDB), and (NO - NDB +10). In order to take into account the hydrogen bond interactions, correlation weights of these three features have been included in the OCWSF scheme. The best OCWSF model is based on three-symbol fragments together with the mentioned three features of the SMILES notation. Its statistical characteristics are: n=100, r2=0.9795, s=5.35°C, F=4673 (training set); n=100, r 2=0.9764, s=5.38°C, F=4055 (test set).

UR - http://www.scopus.com/inward/record.url?scp=28244474463&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=28244474463&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:28244474463

VL - 44

SP - 1545

EP - 1552

JO - Indian Journal of Chemistry - Section A Inorganic, Physical, Theoretical and Analytical Chemistry

JF - Indian Journal of Chemistry - Section A Inorganic, Physical, Theoretical and Analytical Chemistry

SN - 0376-4710

IS - 8

ER -