Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes

Carlo Vittorio Cannistraci, Timothy Ravasi, Franco Maria Montevecchi, Trey Ideker, Massimo Alessio

Research output: Contribution to journalArticle

41 Citations (Scopus)

Abstract

Motivation: Nonlinear small datasets, which are characterized by low numbers of samples and very high numbers of measures, occur frequently in computational biology, and pose problems in their investigation. Unsupervised hybrid-two-phase (H2P) procedures- specifically dimension reduction (DR), coupled with clustering- provide valuable assistance, not only for unsupervised data classification, but also for visualization of the patterns hidden in high-dimensional feature space. Methods: 'Minimum Curvilinearity' (MC) is a principle that-for small datasets-suggests the approximation of curvilinear sample distances in the feature space by pair-wise distances over their minimum spanning tree (MST), and thus avoids the introduction of any tuning parameter. MC is used to design two novel forms of nonlinear machine learning (NML): Minimum Curvilinear embedding (MCE) for DR, and Minimum Curvilinear affinity propagation (MCAP) for clustering. Results: Compared with several other unsupervised and supervised algorithms, MCE and MCAP, whether individually or combined in H2P, overcome the limits of classical approaches. High performance was attained in the visualization and classification of: (i) pain patients (proteomic measurements) in peripheral neuropathy; (ii) human organ tissues (genomic transcription factor measurements) on the basis of their embryological origin. Conclusion: MC provides a valuable framework to estimate nonlinear distances in small datasets. Its extension to large datasets is prefigured for novel NMLs. Classification of neuropathic pain by proteomic profiles offers new insights for future molecular and systems biology characterization of pain. Improvements in tissue embryological classification refine results obtained in an earlier study, and suggest a possible reinterpretation of skin attribution as mesodermal.

Original languageEnglish
Pages (from-to)i531-i539
JournalBioinformatics
Volume27
Issue number13
DOIs
Publication statusPublished - 2011

Fingerprint

Pain
Neuralgia
Dimension Reduction
Cluster Analysis
Clustering
Tissue
Proteomics
Visualization
Molecular biology
Transcription factors
Systems Biology
Feature Space
Peripheral Nervous System Diseases
Affine transformation
Computational Biology
Learning systems
Molecular Biology
Skin
Propagation
Transcription Factors

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Medicine(all)
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes. / Cannistraci, Carlo Vittorio; Ravasi, Timothy; Montevecchi, Franco Maria; Ideker, Trey; Alessio, Massimo.

In: Bioinformatics, Vol. 27, No. 13, 2011, p. i531-i539.

Research output: Contribution to journalArticle

Cannistraci, Carlo Vittorio ; Ravasi, Timothy ; Montevecchi, Franco Maria ; Ideker, Trey ; Alessio, Massimo. / Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes. In: Bioinformatics. 2011 ; Vol. 27, No. 13. pp. i531-i539.
@article{a3d753b616544b0d8f5f9208b20094c5,
title = "Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes",
abstract = "Motivation: Nonlinear small datasets, which are characterized by low numbers of samples and very high numbers of measures, occur frequently in computational biology, and pose problems in their investigation. Unsupervised hybrid-two-phase (H2P) procedures- specifically dimension reduction (DR), coupled with clustering- provide valuable assistance, not only for unsupervised data classification, but also for visualization of the patterns hidden in high-dimensional feature space. Methods: 'Minimum Curvilinearity' (MC) is a principle that-for small datasets-suggests the approximation of curvilinear sample distances in the feature space by pair-wise distances over their minimum spanning tree (MST), and thus avoids the introduction of any tuning parameter. MC is used to design two novel forms of nonlinear machine learning (NML): Minimum Curvilinear embedding (MCE) for DR, and Minimum Curvilinear affinity propagation (MCAP) for clustering. Results: Compared with several other unsupervised and supervised algorithms, MCE and MCAP, whether individually or combined in H2P, overcome the limits of classical approaches. High performance was attained in the visualization and classification of: (i) pain patients (proteomic measurements) in peripheral neuropathy; (ii) human organ tissues (genomic transcription factor measurements) on the basis of their embryological origin. Conclusion: MC provides a valuable framework to estimate nonlinear distances in small datasets. Its extension to large datasets is prefigured for novel NMLs. Classification of neuropathic pain by proteomic profiles offers new insights for future molecular and systems biology characterization of pain. Improvements in tissue embryological classification refine results obtained in an earlier study, and suggest a possible reinterpretation of skin attribution as mesodermal.",
author = "Cannistraci, {Carlo Vittorio} and Timothy Ravasi and Montevecchi, {Franco Maria} and Trey Ideker and Massimo Alessio",
year = "2011",
doi = "10.1093/bioinformatics/btq376",
language = "English",
volume = "27",
pages = "i531--i539",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "NLM (Medline)",
number = "13",

}

TY - JOUR

T1 - Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes

AU - Cannistraci, Carlo Vittorio

AU - Ravasi, Timothy

AU - Montevecchi, Franco Maria

AU - Ideker, Trey

AU - Alessio, Massimo

PY - 2011

Y1 - 2011

N2 - Motivation: Nonlinear small datasets, which are characterized by low numbers of samples and very high numbers of measures, occur frequently in computational biology, and pose problems in their investigation. Unsupervised hybrid-two-phase (H2P) procedures- specifically dimension reduction (DR), coupled with clustering- provide valuable assistance, not only for unsupervised data classification, but also for visualization of the patterns hidden in high-dimensional feature space. Methods: 'Minimum Curvilinearity' (MC) is a principle that-for small datasets-suggests the approximation of curvilinear sample distances in the feature space by pair-wise distances over their minimum spanning tree (MST), and thus avoids the introduction of any tuning parameter. MC is used to design two novel forms of nonlinear machine learning (NML): Minimum Curvilinear embedding (MCE) for DR, and Minimum Curvilinear affinity propagation (MCAP) for clustering. Results: Compared with several other unsupervised and supervised algorithms, MCE and MCAP, whether individually or combined in H2P, overcome the limits of classical approaches. High performance was attained in the visualization and classification of: (i) pain patients (proteomic measurements) in peripheral neuropathy; (ii) human organ tissues (genomic transcription factor measurements) on the basis of their embryological origin. Conclusion: MC provides a valuable framework to estimate nonlinear distances in small datasets. Its extension to large datasets is prefigured for novel NMLs. Classification of neuropathic pain by proteomic profiles offers new insights for future molecular and systems biology characterization of pain. Improvements in tissue embryological classification refine results obtained in an earlier study, and suggest a possible reinterpretation of skin attribution as mesodermal.

AB - Motivation: Nonlinear small datasets, which are characterized by low numbers of samples and very high numbers of measures, occur frequently in computational biology, and pose problems in their investigation. Unsupervised hybrid-two-phase (H2P) procedures- specifically dimension reduction (DR), coupled with clustering- provide valuable assistance, not only for unsupervised data classification, but also for visualization of the patterns hidden in high-dimensional feature space. Methods: 'Minimum Curvilinearity' (MC) is a principle that-for small datasets-suggests the approximation of curvilinear sample distances in the feature space by pair-wise distances over their minimum spanning tree (MST), and thus avoids the introduction of any tuning parameter. MC is used to design two novel forms of nonlinear machine learning (NML): Minimum Curvilinear embedding (MCE) for DR, and Minimum Curvilinear affinity propagation (MCAP) for clustering. Results: Compared with several other unsupervised and supervised algorithms, MCE and MCAP, whether individually or combined in H2P, overcome the limits of classical approaches. High performance was attained in the visualization and classification of: (i) pain patients (proteomic measurements) in peripheral neuropathy; (ii) human organ tissues (genomic transcription factor measurements) on the basis of their embryological origin. Conclusion: MC provides a valuable framework to estimate nonlinear distances in small datasets. Its extension to large datasets is prefigured for novel NMLs. Classification of neuropathic pain by proteomic profiles offers new insights for future molecular and systems biology characterization of pain. Improvements in tissue embryological classification refine results obtained in an earlier study, and suggest a possible reinterpretation of skin attribution as mesodermal.

UR - http://www.scopus.com/inward/record.url?scp=84983146272&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84983146272&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btq376

DO - 10.1093/bioinformatics/btq376

M3 - Article

C2 - 20823318

AN - SCOPUS:77956552656

VL - 27

SP - i531-i539

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 13

ER -