A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks

Marco Frasca, Giuliano Grossi, Jessica Gliozzo, Marco Mesiti, Marco Notaro, Paolo Perlasca, Alessandro Petrini, Giorgio Valentini

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Background: Several problems in network biology and medicine can be cast into a framework where entities are represented through partially labeled networks, and the aim is inferring the labels (usually binary) of the unlabeled part. Connections represent functional or genetic similarity between entities, while the labellings often are highly unbalanced, that is one class is largely under-represented: for instance in the automated protein function prediction (AFP) for most Gene Ontology terms only few proteins are annotated, or in the disease-gene prioritization problem only few genes are actually known to be involved in the etiology of a given disease. Imbalance-aware approaches to accurately predict node labels in biological networks are thereby required. Furthermore, such methods must be scalable, since input data can be large-sized as, for instance, in the context of multi-species protein networks. Results: We propose a novel semi-supervised parallel enhancement of COSNet, an imbalance-aware algorithm build on Hopfield neural model recently suggested to solve the AFP problem. By adopting an efficient representation of the graph and assuming a sparse network topology, we empirically show that it can be efficiently applied to networks with millions of nodes. The key strategy to speed up the computations is to partition nodes into independent sets so as to process each set in parallel by exploiting the power of GPU accelerators. This parallel technique ensures the convergence to asymptotically stable attractors, while preserving the asynchronous dynamics of the original model. Detailed experiments on real data and artificial big instances of the problem highlight scalability and efficiency of the proposed method. Conclusions: By parallelizing COSNet we achieved on average a speed-up of 180x in solving the AFP problem in the S. cerevisiae, Mus musculus and Homo sapiens organisms, while lowering memory requirements. In addition, to show the potential applicability of the method to huge biomolecular networks, we predicted node labels in artificially generated sparse networks involving hundreds of thousands to millions of nodes.

Original languageEnglish
Article number353
JournalBMC Bioinformatics
Volume19
DOIs
Publication statusPublished - Oct 15 2018

Fingerprint

Labels
Learning
Proteins
Protein
Vertex of a graph
Genes
Prediction
Speedup
Gene Ontology
Gene
Prioritization
Biological Networks
Medicine
Particle accelerators
Saccharomyces Cerevisiae
Ontology
Saccharomyces cerevisiae
Scalability
Independent Set
Accelerator

Keywords

  • Biological networks
  • GPU-based Hopfield nets
  • Large-sized networks
  • Node label prediction
  • Protein function prediction

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks. / Frasca, Marco; Grossi, Giuliano; Gliozzo, Jessica; Mesiti, Marco; Notaro, Marco; Perlasca, Paolo; Petrini, Alessandro; Valentini, Giorgio.

In: BMC Bioinformatics, Vol. 19, 353, 15.10.2018.

Research output: Contribution to journalArticle

Frasca, Marco ; Grossi, Giuliano ; Gliozzo, Jessica ; Mesiti, Marco ; Notaro, Marco ; Perlasca, Paolo ; Petrini, Alessandro ; Valentini, Giorgio. / A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks. In: BMC Bioinformatics. 2018 ; Vol. 19.
@article{c580c51096934992bffa6e52ec43a415,
title = "A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks",
abstract = "Background: Several problems in network biology and medicine can be cast into a framework where entities are represented through partially labeled networks, and the aim is inferring the labels (usually binary) of the unlabeled part. Connections represent functional or genetic similarity between entities, while the labellings often are highly unbalanced, that is one class is largely under-represented: for instance in the automated protein function prediction (AFP) for most Gene Ontology terms only few proteins are annotated, or in the disease-gene prioritization problem only few genes are actually known to be involved in the etiology of a given disease. Imbalance-aware approaches to accurately predict node labels in biological networks are thereby required. Furthermore, such methods must be scalable, since input data can be large-sized as, for instance, in the context of multi-species protein networks. Results: We propose a novel semi-supervised parallel enhancement of COSNet, an imbalance-aware algorithm build on Hopfield neural model recently suggested to solve the AFP problem. By adopting an efficient representation of the graph and assuming a sparse network topology, we empirically show that it can be efficiently applied to networks with millions of nodes. The key strategy to speed up the computations is to partition nodes into independent sets so as to process each set in parallel by exploiting the power of GPU accelerators. This parallel technique ensures the convergence to asymptotically stable attractors, while preserving the asynchronous dynamics of the original model. Detailed experiments on real data and artificial big instances of the problem highlight scalability and efficiency of the proposed method. Conclusions: By parallelizing COSNet we achieved on average a speed-up of 180x in solving the AFP problem in the S. cerevisiae, Mus musculus and Homo sapiens organisms, while lowering memory requirements. In addition, to show the potential applicability of the method to huge biomolecular networks, we predicted node labels in artificially generated sparse networks involving hundreds of thousands to millions of nodes.",
keywords = "Biological networks, GPU-based Hopfield nets, Large-sized networks, Node label prediction, Protein function prediction",
author = "Marco Frasca and Giuliano Grossi and Jessica Gliozzo and Marco Mesiti and Marco Notaro and Paolo Perlasca and Alessandro Petrini and Giorgio Valentini",
year = "2018",
month = "10",
day = "15",
doi = "10.1186/s12859-018-2301-4",
language = "English",
volume = "19",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central Ltd.",

}

TY - JOUR

T1 - A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks

AU - Frasca, Marco

AU - Grossi, Giuliano

AU - Gliozzo, Jessica

AU - Mesiti, Marco

AU - Notaro, Marco

AU - Perlasca, Paolo

AU - Petrini, Alessandro

AU - Valentini, Giorgio

PY - 2018/10/15

Y1 - 2018/10/15

N2 - Background: Several problems in network biology and medicine can be cast into a framework where entities are represented through partially labeled networks, and the aim is inferring the labels (usually binary) of the unlabeled part. Connections represent functional or genetic similarity between entities, while the labellings often are highly unbalanced, that is one class is largely under-represented: for instance in the automated protein function prediction (AFP) for most Gene Ontology terms only few proteins are annotated, or in the disease-gene prioritization problem only few genes are actually known to be involved in the etiology of a given disease. Imbalance-aware approaches to accurately predict node labels in biological networks are thereby required. Furthermore, such methods must be scalable, since input data can be large-sized as, for instance, in the context of multi-species protein networks. Results: We propose a novel semi-supervised parallel enhancement of COSNet, an imbalance-aware algorithm build on Hopfield neural model recently suggested to solve the AFP problem. By adopting an efficient representation of the graph and assuming a sparse network topology, we empirically show that it can be efficiently applied to networks with millions of nodes. The key strategy to speed up the computations is to partition nodes into independent sets so as to process each set in parallel by exploiting the power of GPU accelerators. This parallel technique ensures the convergence to asymptotically stable attractors, while preserving the asynchronous dynamics of the original model. Detailed experiments on real data and artificial big instances of the problem highlight scalability and efficiency of the proposed method. Conclusions: By parallelizing COSNet we achieved on average a speed-up of 180x in solving the AFP problem in the S. cerevisiae, Mus musculus and Homo sapiens organisms, while lowering memory requirements. In addition, to show the potential applicability of the method to huge biomolecular networks, we predicted node labels in artificially generated sparse networks involving hundreds of thousands to millions of nodes.

AB - Background: Several problems in network biology and medicine can be cast into a framework where entities are represented through partially labeled networks, and the aim is inferring the labels (usually binary) of the unlabeled part. Connections represent functional or genetic similarity between entities, while the labellings often are highly unbalanced, that is one class is largely under-represented: for instance in the automated protein function prediction (AFP) for most Gene Ontology terms only few proteins are annotated, or in the disease-gene prioritization problem only few genes are actually known to be involved in the etiology of a given disease. Imbalance-aware approaches to accurately predict node labels in biological networks are thereby required. Furthermore, such methods must be scalable, since input data can be large-sized as, for instance, in the context of multi-species protein networks. Results: We propose a novel semi-supervised parallel enhancement of COSNet, an imbalance-aware algorithm build on Hopfield neural model recently suggested to solve the AFP problem. By adopting an efficient representation of the graph and assuming a sparse network topology, we empirically show that it can be efficiently applied to networks with millions of nodes. The key strategy to speed up the computations is to partition nodes into independent sets so as to process each set in parallel by exploiting the power of GPU accelerators. This parallel technique ensures the convergence to asymptotically stable attractors, while preserving the asynchronous dynamics of the original model. Detailed experiments on real data and artificial big instances of the problem highlight scalability and efficiency of the proposed method. Conclusions: By parallelizing COSNet we achieved on average a speed-up of 180x in solving the AFP problem in the S. cerevisiae, Mus musculus and Homo sapiens organisms, while lowering memory requirements. In addition, to show the potential applicability of the method to huge biomolecular networks, we predicted node labels in artificially generated sparse networks involving hundreds of thousands to millions of nodes.

KW - Biological networks

KW - GPU-based Hopfield nets

KW - Large-sized networks

KW - Node label prediction

KW - Protein function prediction

UR - http://www.scopus.com/inward/record.url?scp=85054862260&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054862260&partnerID=8YFLogxK

U2 - 10.1186/s12859-018-2301-4

DO - 10.1186/s12859-018-2301-4

M3 - Article

AN - SCOPUS:85054862260

VL - 19

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 353

ER -