Bayesian Optimization Improves Tissue-Specific Prediction of Active Regulatory Regions with Deep Neural Networks

Luca Cappelletti, Alessandro Petrini, Jessica Gliozzo, Elena Casiraghi, Max Schubach, Martin Kircher, Giorgio Valentini

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The annotation and characterization of tissue-specific cis-regulatory elements (CREs) in non-coding DNA represents an open challenge in computational genomics. Several prior works show that machine learning methods, using epigenetic or spectral features directly extracted from DNA sequences, can predict active promoters and enhancers in specific tissues or cell lines. In particular, very recently deep-learning techniques obtained state-of-the-art results in this challenging computational task. In this study, we provide additional evidence that Feed Forward Neural Networks (FFNN) trained on epigenetic data and one-dimensional convolutional neural networks (CNN) trained on DNA sequence data can successfully predict active regulatory regions in different cell lines. We show that model selection by means of Bayesian optimization applied to both FFNN and CNN models can significantly improve deep neural network performance, by automatically finding models that best fit the data. Further, we show that techniques applied to balance active and non-active regulatory regions in the human genome in training and test data may lead to over-optimistic or poor predictions. We recommend to use actual imbalanced data that was not used to train the models for evaluating their generalization performance.

Original languageEnglish
Title of host publicationBioinformatics and Biomedical Engineering - 8th International Work-Conference, IWBBIO 2020, Proceedings
EditorsIgnacio Rojas, Olga Valenzuela, Fernando Rojas, Luis Javier Herrera, Francisco Ortuño
PublisherSpringer
Pages600-612
Number of pages13
ISBN (Print)9783030453848
DOIs
Publication statusPublished - Jan 1 2020
Event8th International Work-Conference on Bioinformatics and Biomedical Engineering, IWBBIO 2020 - Granada, Spain
Duration: May 6 2020May 8 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12108 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference8th International Work-Conference on Bioinformatics and Biomedical Engineering, IWBBIO 2020
CountrySpain
CityGranada
Period5/6/205/8/20

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Bayesian Optimization Improves Tissue-Specific Prediction of Active Regulatory Regions with Deep Neural Networks'. Together they form a unique fingerprint.

Cite this