SMILES-based optimal descriptors: QSAR modeling of carcinogenicity by balance of correlations with ideal slopes

A. A. Toropov, A. P. Toropova, E. Benfenati

Research output: Contribution to journalArticlepeer-review


Optimal descriptors which are calculated using the simplified molecular input line entry system (SMILES) were utilized to build quantitative structure-activity relationships (QSAR) of carcinogenicity (log TD50). Three schemes of the modeling have been examined: 1. The most traditional "classic" training-test system, i.e., models are built with training set and validated with external test set; 2. The correlation balance, i.e., models are built with preliminary estimation of the predictability of the model with the calibration set (this set plays a role of preliminary test set); and 3. The extended correlation balance that takes into account the slopes of regression lines in plots experimental versus predicted values of carcinogenicity (in ideal, these slopes should be similar). It has been shown that the extended correlation balance with the ideal slopes gives most robust prediction of carcinogenicity for external test set. These models have been built by Monte Carlo method for three splits into subtraining set, calibration set, and test set. The number of the N-nitroso groups (i.e., R1-N(R2)-NO) in a molecular system has been examined as an additional descriptor.

Original languageEnglish
Pages (from-to)3581-3587
Number of pages7
JournalEuropean Journal of Medicinal Chemistry
Issue number9
Publication statusPublished - Sep 2010


  • Balance of correlations
  • Carcinogenicity
  • Optimal descriptor
  • QSAR

ASJC Scopus subject areas

  • Drug Discovery
  • Organic Chemistry
  • Pharmacology


Dive into the research topics of 'SMILES-based optimal descriptors: QSAR modeling of carcinogenicity by balance of correlations with ideal slopes'. Together they form a unique fingerprint.

Cite this