Co-evolutions of correlations for QSAR of toxicity of organometallic and inorganic substances: An unexpected good prediction based on a model that seems untrustworthy

A. P. Toropova, A. A. Toropov, E. Benfenati, G. Gini

Research output: Contribution to journalArticle

33 Citations (Scopus)

Abstract

The simplified molecular input line entry system (SMILES) gives a representation of the molecular structure by a sequence of special characters indicating different chemical elements, double/triple covalent bonds, and other features. We used this representation to establish quantitative structure-activity relationships (QSAR) for toxicity (pLD50, minus decimal logarithm of 50% lethal dose) of organometallic and inorganic substances. The balance of correlations was used in the Monte Carlo optimization aimed to build up optimal descriptors. It should be noted, that there are few QSAR models in the literature which are dealing with organometallic and inorganic substances. We used CORAL (CORrelations And Logic) freeware, available on the Internet, for the modelling. Ten random splits into the sub-training, calibration, and test sets have been examined. Statistical characteristics of the model (for the split 1) are the following: n=57, r2=0.6005, Q2=0.5721, s=0.448, F=83 (sub-training set); n=55, r2=0.6005, R2 pred=0.5701, s=0.501 (calibration set); n=12, r2=0.8296, R2 pred=0.7695, and s=0.233 Rm 2=0.8142 (test set). Statistical quality of models for other examined splits is also reasonable well.

Original languageEnglish
Pages (from-to)215-219
Number of pages5
JournalChemometrics and Intelligent Laboratory Systems
Volume105
Issue number2
DOIs
Publication statusPublished - Feb 15 2011

Fingerprint

Organometallics
Toxicity
Calibration
Covalent bonds
Chemical elements
Molecular structure
Internet

Keywords

  • Balance of correlation
  • Co-evolution of correlation
  • Optimal descriptor
  • QSAR
  • SMILES
  • Toxicity towards rat

ASJC Scopus subject areas

  • Analytical Chemistry
  • Computer Science Applications
  • Software
  • Process Chemistry and Technology
  • Spectroscopy

Cite this

@article{35337448bd444400816d07906aeb5e60,
title = "Co-evolutions of correlations for QSAR of toxicity of organometallic and inorganic substances: An unexpected good prediction based on a model that seems untrustworthy",
abstract = "The simplified molecular input line entry system (SMILES) gives a representation of the molecular structure by a sequence of special characters indicating different chemical elements, double/triple covalent bonds, and other features. We used this representation to establish quantitative structure-activity relationships (QSAR) for toxicity (pLD50, minus decimal logarithm of 50{\%} lethal dose) of organometallic and inorganic substances. The balance of correlations was used in the Monte Carlo optimization aimed to build up optimal descriptors. It should be noted, that there are few QSAR models in the literature which are dealing with organometallic and inorganic substances. We used CORAL (CORrelations And Logic) freeware, available on the Internet, for the modelling. Ten random splits into the sub-training, calibration, and test sets have been examined. Statistical characteristics of the model (for the split 1) are the following: n=57, r2=0.6005, Q2=0.5721, s=0.448, F=83 (sub-training set); n=55, r2=0.6005, R2 pred=0.5701, s=0.501 (calibration set); n=12, r2=0.8296, R2 pred=0.7695, and s=0.233 Rm 2=0.8142 (test set). Statistical quality of models for other examined splits is also reasonable well.",
keywords = "Balance of correlation, Co-evolution of correlation, Optimal descriptor, QSAR, SMILES, Toxicity towards rat",
author = "Toropova, {A. P.} and Toropov, {A. A.} and E. Benfenati and G. Gini",
year = "2011",
month = "2",
day = "15",
doi = "10.1016/j.chemolab.2010.12.007",
language = "English",
volume = "105",
pages = "215--219",
journal = "Chemometrics and Intelligent Laboratory Systems",
issn = "0169-7439",
publisher = "Elsevier",
number = "2",

}

TY - JOUR

T1 - Co-evolutions of correlations for QSAR of toxicity of organometallic and inorganic substances

T2 - An unexpected good prediction based on a model that seems untrustworthy

AU - Toropova, A. P.

AU - Toropov, A. A.

AU - Benfenati, E.

AU - Gini, G.

PY - 2011/2/15

Y1 - 2011/2/15

N2 - The simplified molecular input line entry system (SMILES) gives a representation of the molecular structure by a sequence of special characters indicating different chemical elements, double/triple covalent bonds, and other features. We used this representation to establish quantitative structure-activity relationships (QSAR) for toxicity (pLD50, minus decimal logarithm of 50% lethal dose) of organometallic and inorganic substances. The balance of correlations was used in the Monte Carlo optimization aimed to build up optimal descriptors. It should be noted, that there are few QSAR models in the literature which are dealing with organometallic and inorganic substances. We used CORAL (CORrelations And Logic) freeware, available on the Internet, for the modelling. Ten random splits into the sub-training, calibration, and test sets have been examined. Statistical characteristics of the model (for the split 1) are the following: n=57, r2=0.6005, Q2=0.5721, s=0.448, F=83 (sub-training set); n=55, r2=0.6005, R2 pred=0.5701, s=0.501 (calibration set); n=12, r2=0.8296, R2 pred=0.7695, and s=0.233 Rm 2=0.8142 (test set). Statistical quality of models for other examined splits is also reasonable well.

AB - The simplified molecular input line entry system (SMILES) gives a representation of the molecular structure by a sequence of special characters indicating different chemical elements, double/triple covalent bonds, and other features. We used this representation to establish quantitative structure-activity relationships (QSAR) for toxicity (pLD50, minus decimal logarithm of 50% lethal dose) of organometallic and inorganic substances. The balance of correlations was used in the Monte Carlo optimization aimed to build up optimal descriptors. It should be noted, that there are few QSAR models in the literature which are dealing with organometallic and inorganic substances. We used CORAL (CORrelations And Logic) freeware, available on the Internet, for the modelling. Ten random splits into the sub-training, calibration, and test sets have been examined. Statistical characteristics of the model (for the split 1) are the following: n=57, r2=0.6005, Q2=0.5721, s=0.448, F=83 (sub-training set); n=55, r2=0.6005, R2 pred=0.5701, s=0.501 (calibration set); n=12, r2=0.8296, R2 pred=0.7695, and s=0.233 Rm 2=0.8142 (test set). Statistical quality of models for other examined splits is also reasonable well.

KW - Balance of correlation

KW - Co-evolution of correlation

KW - Optimal descriptor

KW - QSAR

KW - SMILES

KW - Toxicity towards rat

UR - http://www.scopus.com/inward/record.url?scp=79951960171&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79951960171&partnerID=8YFLogxK

U2 - 10.1016/j.chemolab.2010.12.007

DO - 10.1016/j.chemolab.2010.12.007

M3 - Article

AN - SCOPUS:79951960171

VL - 105

SP - 215

EP - 219

JO - Chemometrics and Intelligent Laboratory Systems

JF - Chemometrics and Intelligent Laboratory Systems

SN - 0169-7439

IS - 2

ER -