Phylo_dCor: Distance correlation as a novel metric for phylogenetic profiling

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Background: Elaboration of powerful methods to predict functional and/or physical protein-protein interactions from genome sequence is one of the main tasks in the post-genomic era. Phylogenetic profiling allows the prediction of protein-protein interactions at a whole genome level in both Prokaryotes and Eukaryotes. For this reason it is considered one of the most promising methods. Results: Here, we propose an improvement of phylogenetic profiling that enables handling of large genomic datasets and infer global protein-protein interactions. This method uses the distance correlation as a new measure of phylogenetic profile similarity. We constructed robust reference sets and developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation that makes it applicable to large genomic data. Using Saccharomyces cerevisiae and Escherichia coli genome datasets, we showed that Phylo-dCor outperforms phylogenetic profiling methods previously described based on the mutual information and Pearson's correlation as measures of profile similarity. Conclusions: In this work, we constructed and assessed robust reference sets and propose the distance correlation as a measure for comparing phylogenetic profiles. To make it applicable to large genomic data, we developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation. Two R scripts that can be run on a wide range of machines are available upon request.

Original languageEnglish
Article number396
JournalBMC Bioinformatics
Volume18
Issue number1
DOIs
Publication statusPublished - Sep 5 2017

Fingerprint

Phylogenetics
Profiling
Genomics
Proteins
Protein-protein Interaction
Metric
Genome
Genes
Pearson Correlation
Saccharomyces Cerevisiae
Mutual Information
Escherichia Coli
Eukaryota
Yeast
Escherichia coli
Saccharomyces cerevisiae
Predict
Prediction
Range of data
Profile

Keywords

  • Distance correlation
  • Phylogenetic profiling
  • Protein-protein interaction

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Phylo_dCor : Distance correlation as a novel metric for phylogenetic profiling. / Sferra, Gabriella; Fratini, Federica; Ponzi, Marta; Pizzi, Elisabetta.

In: BMC Bioinformatics, Vol. 18, No. 1, 396, 05.09.2017.

Research output: Contribution to journalArticle

@article{401f3601f971475b97126ece1acb89ca,
title = "Phylo_dCor: Distance correlation as a novel metric for phylogenetic profiling",
abstract = "Background: Elaboration of powerful methods to predict functional and/or physical protein-protein interactions from genome sequence is one of the main tasks in the post-genomic era. Phylogenetic profiling allows the prediction of protein-protein interactions at a whole genome level in both Prokaryotes and Eukaryotes. For this reason it is considered one of the most promising methods. Results: Here, we propose an improvement of phylogenetic profiling that enables handling of large genomic datasets and infer global protein-protein interactions. This method uses the distance correlation as a new measure of phylogenetic profile similarity. We constructed robust reference sets and developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation that makes it applicable to large genomic data. Using Saccharomyces cerevisiae and Escherichia coli genome datasets, we showed that Phylo-dCor outperforms phylogenetic profiling methods previously described based on the mutual information and Pearson's correlation as measures of profile similarity. Conclusions: In this work, we constructed and assessed robust reference sets and propose the distance correlation as a measure for comparing phylogenetic profiles. To make it applicable to large genomic data, we developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation. Two R scripts that can be run on a wide range of machines are available upon request.",
keywords = "Distance correlation, Phylogenetic profiling, Protein-protein interaction",
author = "Gabriella Sferra and Federica Fratini and Marta Ponzi and Elisabetta Pizzi",
year = "2017",
month = "9",
day = "5",
doi = "10.1186/s12859-017-1815-5",
language = "English",
volume = "18",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central Ltd.",
number = "1",

}

TY - JOUR

T1 - Phylo_dCor

T2 - Distance correlation as a novel metric for phylogenetic profiling

AU - Sferra, Gabriella

AU - Fratini, Federica

AU - Ponzi, Marta

AU - Pizzi, Elisabetta

PY - 2017/9/5

Y1 - 2017/9/5

N2 - Background: Elaboration of powerful methods to predict functional and/or physical protein-protein interactions from genome sequence is one of the main tasks in the post-genomic era. Phylogenetic profiling allows the prediction of protein-protein interactions at a whole genome level in both Prokaryotes and Eukaryotes. For this reason it is considered one of the most promising methods. Results: Here, we propose an improvement of phylogenetic profiling that enables handling of large genomic datasets and infer global protein-protein interactions. This method uses the distance correlation as a new measure of phylogenetic profile similarity. We constructed robust reference sets and developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation that makes it applicable to large genomic data. Using Saccharomyces cerevisiae and Escherichia coli genome datasets, we showed that Phylo-dCor outperforms phylogenetic profiling methods previously described based on the mutual information and Pearson's correlation as measures of profile similarity. Conclusions: In this work, we constructed and assessed robust reference sets and propose the distance correlation as a measure for comparing phylogenetic profiles. To make it applicable to large genomic data, we developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation. Two R scripts that can be run on a wide range of machines are available upon request.

AB - Background: Elaboration of powerful methods to predict functional and/or physical protein-protein interactions from genome sequence is one of the main tasks in the post-genomic era. Phylogenetic profiling allows the prediction of protein-protein interactions at a whole genome level in both Prokaryotes and Eukaryotes. For this reason it is considered one of the most promising methods. Results: Here, we propose an improvement of phylogenetic profiling that enables handling of large genomic datasets and infer global protein-protein interactions. This method uses the distance correlation as a new measure of phylogenetic profile similarity. We constructed robust reference sets and developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation that makes it applicable to large genomic data. Using Saccharomyces cerevisiae and Escherichia coli genome datasets, we showed that Phylo-dCor outperforms phylogenetic profiling methods previously described based on the mutual information and Pearson's correlation as measures of profile similarity. Conclusions: In this work, we constructed and assessed robust reference sets and propose the distance correlation as a measure for comparing phylogenetic profiles. To make it applicable to large genomic data, we developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation. Two R scripts that can be run on a wide range of machines are available upon request.

KW - Distance correlation

KW - Phylogenetic profiling

KW - Protein-protein interaction

UR - http://www.scopus.com/inward/record.url?scp=85028705100&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85028705100&partnerID=8YFLogxK

U2 - 10.1186/s12859-017-1815-5

DO - 10.1186/s12859-017-1815-5

M3 - Article

C2 - 28870256

AN - SCOPUS:85028705100

VL - 18

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 396

ER -