FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data

Limin Fu, Enzo Medico

Research output: Contribution to journalArticle

Abstract

Background: Data clustering analysis has been extensively applied to extract information from gene expression profiles obtained with DNA microarrays. To this aim, existing clustering approaches, mainly developed in computer science, have been adapted to microarray data analysis. However, previous studies revealed that microarray datasets have very diverse structures, some of which may not be correctly captured by current clustering methods. We therefore approached the problem from a new starting point, and developed a clustering algorithm designed to capture dataset-specific structures at the beginning of the process. Results: The clustering algorithm is named Fuzzy clustering by Local Approximation of MEmbership (FLAME). Distinctive elements of FLAME are: (i) definition of the neighborhood of each object (gene or sample) and identification of objects with "archetypal" features named Cluster Supporting Objects, around which to construct the clusters; (ii) assignment to each object of a fuzzy membership vector approximated from the memberships of its neighboring objects, by an iterative converging process in which membership spreads from the Cluster Supporting Objects through their neighbors. Comparative analysis with K-means, hierarchical, fuzzy C-means and fuzzy self-organizing maps (SOM) showed that data partitions generated by FLAME are not superimposable to those of other methods and, although different types of datasets are better partitioned by different algorithms, FLAME displays the best overall performance. FLAME is implemented, together with all the above-mentioned algorithms, in a C++ software with graphical interface for Linux and Windows, capable of handling very large datasets, named Gene Expression Data Analysis Studio (GEDAS), freely available under GNU General Public License. Conclusion: The FLAME algorithm has intrinsic advantages, such as the ability to capture non-linear relationships and non-globular clusters, the automated definition of the number of clusters, and the identification of cluster outliers, i.e. genes that are not assigned to any cluster. As a result, clusters are more internally homogeneous and more diverse from each other, and provide better partitioning of biological functions. The clustering algorithm can be easily extended to applications different from gene expression analysis.

Original languageEnglish
Article number3
JournalBMC Bioinformatics
Volume8
DOIs
Publication statusPublished - Jan 4 2007

Fingerprint

DNA Microarray
Local Approximation
Fuzzy clustering
Fuzzy Clustering
Microarrays
Microarray Data
Oligonucleotide Array Sequence Analysis
Clustering Methods
Cluster Analysis
DNA
Gene expression
Clustering algorithms
Clustering Algorithm
Genes
Studios
Self organizing maps
Gene
Gene Expression Analysis
Microarray Data Analysis
Fuzzy Membership

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Structural Biology
  • Applied Mathematics
  • Medicine(all)

Cite this

FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. / Fu, Limin; Medico, Enzo.

In: BMC Bioinformatics, Vol. 8, 3, 04.01.2007.

Research output: Contribution to journalArticle

@article{1965bab8f7024385b10362a712c4cf57,
title = "FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data",
abstract = "Background: Data clustering analysis has been extensively applied to extract information from gene expression profiles obtained with DNA microarrays. To this aim, existing clustering approaches, mainly developed in computer science, have been adapted to microarray data analysis. However, previous studies revealed that microarray datasets have very diverse structures, some of which may not be correctly captured by current clustering methods. We therefore approached the problem from a new starting point, and developed a clustering algorithm designed to capture dataset-specific structures at the beginning of the process. Results: The clustering algorithm is named Fuzzy clustering by Local Approximation of MEmbership (FLAME). Distinctive elements of FLAME are: (i) definition of the neighborhood of each object (gene or sample) and identification of objects with {"}archetypal{"} features named Cluster Supporting Objects, around which to construct the clusters; (ii) assignment to each object of a fuzzy membership vector approximated from the memberships of its neighboring objects, by an iterative converging process in which membership spreads from the Cluster Supporting Objects through their neighbors. Comparative analysis with K-means, hierarchical, fuzzy C-means and fuzzy self-organizing maps (SOM) showed that data partitions generated by FLAME are not superimposable to those of other methods and, although different types of datasets are better partitioned by different algorithms, FLAME displays the best overall performance. FLAME is implemented, together with all the above-mentioned algorithms, in a C++ software with graphical interface for Linux and Windows, capable of handling very large datasets, named Gene Expression Data Analysis Studio (GEDAS), freely available under GNU General Public License. Conclusion: The FLAME algorithm has intrinsic advantages, such as the ability to capture non-linear relationships and non-globular clusters, the automated definition of the number of clusters, and the identification of cluster outliers, i.e. genes that are not assigned to any cluster. As a result, clusters are more internally homogeneous and more diverse from each other, and provide better partitioning of biological functions. The clustering algorithm can be easily extended to applications different from gene expression analysis.",
author = "Limin Fu and Enzo Medico",
year = "2007",
month = "1",
day = "4",
doi = "10.1186/1471-2105-8-3",
language = "English",
volume = "8",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central Ltd.",

}

TY - JOUR

T1 - FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data

AU - Fu, Limin

AU - Medico, Enzo

PY - 2007/1/4

Y1 - 2007/1/4

N2 - Background: Data clustering analysis has been extensively applied to extract information from gene expression profiles obtained with DNA microarrays. To this aim, existing clustering approaches, mainly developed in computer science, have been adapted to microarray data analysis. However, previous studies revealed that microarray datasets have very diverse structures, some of which may not be correctly captured by current clustering methods. We therefore approached the problem from a new starting point, and developed a clustering algorithm designed to capture dataset-specific structures at the beginning of the process. Results: The clustering algorithm is named Fuzzy clustering by Local Approximation of MEmbership (FLAME). Distinctive elements of FLAME are: (i) definition of the neighborhood of each object (gene or sample) and identification of objects with "archetypal" features named Cluster Supporting Objects, around which to construct the clusters; (ii) assignment to each object of a fuzzy membership vector approximated from the memberships of its neighboring objects, by an iterative converging process in which membership spreads from the Cluster Supporting Objects through their neighbors. Comparative analysis with K-means, hierarchical, fuzzy C-means and fuzzy self-organizing maps (SOM) showed that data partitions generated by FLAME are not superimposable to those of other methods and, although different types of datasets are better partitioned by different algorithms, FLAME displays the best overall performance. FLAME is implemented, together with all the above-mentioned algorithms, in a C++ software with graphical interface for Linux and Windows, capable of handling very large datasets, named Gene Expression Data Analysis Studio (GEDAS), freely available under GNU General Public License. Conclusion: The FLAME algorithm has intrinsic advantages, such as the ability to capture non-linear relationships and non-globular clusters, the automated definition of the number of clusters, and the identification of cluster outliers, i.e. genes that are not assigned to any cluster. As a result, clusters are more internally homogeneous and more diverse from each other, and provide better partitioning of biological functions. The clustering algorithm can be easily extended to applications different from gene expression analysis.

AB - Background: Data clustering analysis has been extensively applied to extract information from gene expression profiles obtained with DNA microarrays. To this aim, existing clustering approaches, mainly developed in computer science, have been adapted to microarray data analysis. However, previous studies revealed that microarray datasets have very diverse structures, some of which may not be correctly captured by current clustering methods. We therefore approached the problem from a new starting point, and developed a clustering algorithm designed to capture dataset-specific structures at the beginning of the process. Results: The clustering algorithm is named Fuzzy clustering by Local Approximation of MEmbership (FLAME). Distinctive elements of FLAME are: (i) definition of the neighborhood of each object (gene or sample) and identification of objects with "archetypal" features named Cluster Supporting Objects, around which to construct the clusters; (ii) assignment to each object of a fuzzy membership vector approximated from the memberships of its neighboring objects, by an iterative converging process in which membership spreads from the Cluster Supporting Objects through their neighbors. Comparative analysis with K-means, hierarchical, fuzzy C-means and fuzzy self-organizing maps (SOM) showed that data partitions generated by FLAME are not superimposable to those of other methods and, although different types of datasets are better partitioned by different algorithms, FLAME displays the best overall performance. FLAME is implemented, together with all the above-mentioned algorithms, in a C++ software with graphical interface for Linux and Windows, capable of handling very large datasets, named Gene Expression Data Analysis Studio (GEDAS), freely available under GNU General Public License. Conclusion: The FLAME algorithm has intrinsic advantages, such as the ability to capture non-linear relationships and non-globular clusters, the automated definition of the number of clusters, and the identification of cluster outliers, i.e. genes that are not assigned to any cluster. As a result, clusters are more internally homogeneous and more diverse from each other, and provide better partitioning of biological functions. The clustering algorithm can be easily extended to applications different from gene expression analysis.

UR - http://www.scopus.com/inward/record.url?scp=39449096984&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=39449096984&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-8-3

DO - 10.1186/1471-2105-8-3

M3 - Article

C2 - 17204155

AN - SCOPUS:39449096984

VL - 8

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 3

ER -