Linking entries in protein interaction database to structured text: The FEBS Letters experiment

Arnaud Ceol, Andrew Chatr-Aryamontri, Luana Licata, Gianni Cesareni

Research output: Contribution to journalArticle

57 Citations (Scopus)

Abstract

The corpus of the scientific literature has reached such size that a lot of useful data, dispersed throughout millions different articles, are now hard to recover. For instance, many articles in the biological domain describe relationships between entities (gene, proteins, small molecules, etc.) yet this crucial information cannot be efficiently used because of the difficulties in retrieving it automatically from unstructured text. Databases are striving to capture this valuable information and to organize it in a structured format ready for automatic analysis. However, the current database model, based on manual curation, is not sustainable because the limited support is not compatible with complete and accurate coverage of published information. Several proposals have been put forward to increase the efficiency and accuracy of the curation process. Here we present an experiment, designed by the editorial board of FEBS Letters, aimed at integrating each manuscript with a structured summary precisely reporting, with database identifiers and predefined controlled vocabularies, the protein interactions reported in the manuscript. The authors play an important role in this process as they are requested to provide structured information to be appended, in the form of human-readable paragraphs, at the end of traditional summaries. It is envisaged that the structured text will become an integral part of Medline abstracts. In 6 months time the experience gained with this experiment will form the basis for a community discussion to propose a widely accepted strategy for information storage and retrieval.

Original languageEnglish
Pages (from-to)1171-1177
Number of pages7
JournalFEBS Letters
Volume582
Issue number8
DOIs
Publication statusPublished - Apr 9 2008

Fingerprint

Protein Databases
Manuscripts
Databases
Controlled Vocabulary
Literature
Thesauri
Proteins
Information Storage and Retrieval
Experiments
Information retrieval
Data storage equipment
Molecules

Keywords

  • Database
  • Information extraction
  • Network
  • Protein interaction

ASJC Scopus subject areas

  • Biochemistry
  • Biophysics
  • Molecular Biology

Cite this

Linking entries in protein interaction database to structured text : The FEBS Letters experiment. / Ceol, Arnaud; Chatr-Aryamontri, Andrew; Licata, Luana; Cesareni, Gianni.

In: FEBS Letters, Vol. 582, No. 8, 09.04.2008, p. 1171-1177.

Research output: Contribution to journalArticle

Ceol, Arnaud ; Chatr-Aryamontri, Andrew ; Licata, Luana ; Cesareni, Gianni. / Linking entries in protein interaction database to structured text : The FEBS Letters experiment. In: FEBS Letters. 2008 ; Vol. 582, No. 8. pp. 1171-1177.
@article{b9ff64400895493ab9a41ab0ed05fc80,
title = "Linking entries in protein interaction database to structured text: The FEBS Letters experiment",
abstract = "The corpus of the scientific literature has reached such size that a lot of useful data, dispersed throughout millions different articles, are now hard to recover. For instance, many articles in the biological domain describe relationships between entities (gene, proteins, small molecules, etc.) yet this crucial information cannot be efficiently used because of the difficulties in retrieving it automatically from unstructured text. Databases are striving to capture this valuable information and to organize it in a structured format ready for automatic analysis. However, the current database model, based on manual curation, is not sustainable because the limited support is not compatible with complete and accurate coverage of published information. Several proposals have been put forward to increase the efficiency and accuracy of the curation process. Here we present an experiment, designed by the editorial board of FEBS Letters, aimed at integrating each manuscript with a structured summary precisely reporting, with database identifiers and predefined controlled vocabularies, the protein interactions reported in the manuscript. The authors play an important role in this process as they are requested to provide structured information to be appended, in the form of human-readable paragraphs, at the end of traditional summaries. It is envisaged that the structured text will become an integral part of Medline abstracts. In 6 months time the experience gained with this experiment will form the basis for a community discussion to propose a widely accepted strategy for information storage and retrieval.",
keywords = "Database, Information extraction, Network, Protein interaction",
author = "Arnaud Ceol and Andrew Chatr-Aryamontri and Luana Licata and Gianni Cesareni",
year = "2008",
month = "4",
day = "9",
doi = "10.1016/j.febslet.2008.02.071",
language = "English",
volume = "582",
pages = "1171--1177",
journal = "FEBS Letters",
issn = "0014-5793",
publisher = "Elsevier",
number = "8",

}

TY - JOUR

T1 - Linking entries in protein interaction database to structured text

T2 - The FEBS Letters experiment

AU - Ceol, Arnaud

AU - Chatr-Aryamontri, Andrew

AU - Licata, Luana

AU - Cesareni, Gianni

PY - 2008/4/9

Y1 - 2008/4/9

N2 - The corpus of the scientific literature has reached such size that a lot of useful data, dispersed throughout millions different articles, are now hard to recover. For instance, many articles in the biological domain describe relationships between entities (gene, proteins, small molecules, etc.) yet this crucial information cannot be efficiently used because of the difficulties in retrieving it automatically from unstructured text. Databases are striving to capture this valuable information and to organize it in a structured format ready for automatic analysis. However, the current database model, based on manual curation, is not sustainable because the limited support is not compatible with complete and accurate coverage of published information. Several proposals have been put forward to increase the efficiency and accuracy of the curation process. Here we present an experiment, designed by the editorial board of FEBS Letters, aimed at integrating each manuscript with a structured summary precisely reporting, with database identifiers and predefined controlled vocabularies, the protein interactions reported in the manuscript. The authors play an important role in this process as they are requested to provide structured information to be appended, in the form of human-readable paragraphs, at the end of traditional summaries. It is envisaged that the structured text will become an integral part of Medline abstracts. In 6 months time the experience gained with this experiment will form the basis for a community discussion to propose a widely accepted strategy for information storage and retrieval.

AB - The corpus of the scientific literature has reached such size that a lot of useful data, dispersed throughout millions different articles, are now hard to recover. For instance, many articles in the biological domain describe relationships between entities (gene, proteins, small molecules, etc.) yet this crucial information cannot be efficiently used because of the difficulties in retrieving it automatically from unstructured text. Databases are striving to capture this valuable information and to organize it in a structured format ready for automatic analysis. However, the current database model, based on manual curation, is not sustainable because the limited support is not compatible with complete and accurate coverage of published information. Several proposals have been put forward to increase the efficiency and accuracy of the curation process. Here we present an experiment, designed by the editorial board of FEBS Letters, aimed at integrating each manuscript with a structured summary precisely reporting, with database identifiers and predefined controlled vocabularies, the protein interactions reported in the manuscript. The authors play an important role in this process as they are requested to provide structured information to be appended, in the form of human-readable paragraphs, at the end of traditional summaries. It is envisaged that the structured text will become an integral part of Medline abstracts. In 6 months time the experience gained with this experiment will form the basis for a community discussion to propose a widely accepted strategy for information storage and retrieval.

KW - Database

KW - Information extraction

KW - Network

KW - Protein interaction

UR - http://www.scopus.com/inward/record.url?scp=41249090605&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=41249090605&partnerID=8YFLogxK

U2 - 10.1016/j.febslet.2008.02.071

DO - 10.1016/j.febslet.2008.02.071

M3 - Article

C2 - 18328820

AN - SCOPUS:41249090605

VL - 582

SP - 1171

EP - 1177

JO - FEBS Letters

JF - FEBS Letters

SN - 0014-5793

IS - 8

ER -