Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases

Andrew Chatr-aryamontri, Andrew Winter, Livia Perfetto, Leonardo Briganti, Luana Licata, Marta Iannuccelli, Luisa Castagnoli, Gianni Cesareni, Mike Tyers

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

Background: The vast amount of data published in the primary biomedical literature represents a challenge for the automated extraction and codification of individual data elements. Biological databases that rely solely on manual extraction by expert curators are unable to comprehensively annotate the information dispersed across the entire biomedical literature. The development of efficient tools based on natural language processing (NLP) systems is essential for the selection of relevant publications, identification of data attributes and partially automated annotation. One of the tasks of the Biocreative 2010 Challenge III was devoted to the evaluation of NLP systems developed to identify articles for curation and extraction of protein-protein interaction (PPI) data.Results: The Biocreative 2010 competition addressed three tasks: gene normalization, article classification and interaction method identification. The BioGRID and MINT protein interaction databases both participated in the generation of the test publication set for gene normalization, annotated the development and test sets for article classification, and curated the test set for interaction method classification. These test datasets served as a gold standard for the evaluation of data extraction algorithms.Conclusion: The development of efficient tools for extraction of PPI data is a necessary step to achieve full curation of the biomedical literature. NLP systems can in the first instance facilitate expert curation by refining the list of candidate publications that contain PPI data; more ambitiously, NLP approaches may be able to directly extract relevant information from full-text articles for rapid inspection by expert curators. Close collaboration between biological databases and NLP systems developers will continue to facilitate the long-term objectives of both disciplines.

Original languageEnglish
Article numberS8
JournalBMC Bioinformatics
Volume12
Issue numberSUPPL. 8
DOIs
Publication statusPublished - Oct 3 2011

Fingerprint

Benchmarking
Natural Language Processing
Data Mining
Text Mining
Natural language processing systems
Databases
Proteins
Natural Language
Interaction
Protein-protein Interaction
Publications
Test Set
Normalization
Genes
Protein Databases
Gene
Evaluation
Gold
Refining
Annotation

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics
  • Structural Biology

Cite this

Chatr-aryamontri, A., Winter, A., Perfetto, L., Briganti, L., Licata, L., Iannuccelli, M., ... Tyers, M. (2011). Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases. BMC Bioinformatics, 12(SUPPL. 8), [S8]. https://doi.org/10.1186/1471-2105-12-S8-S8

Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases. / Chatr-aryamontri, Andrew; Winter, Andrew; Perfetto, Livia; Briganti, Leonardo; Licata, Luana; Iannuccelli, Marta; Castagnoli, Luisa; Cesareni, Gianni; Tyers, Mike.

In: BMC Bioinformatics, Vol. 12, No. SUPPL. 8, S8, 03.10.2011.

Research output: Contribution to journalArticle

Chatr-aryamontri, A, Winter, A, Perfetto, L, Briganti, L, Licata, L, Iannuccelli, M, Castagnoli, L, Cesareni, G & Tyers, M 2011, 'Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases', BMC Bioinformatics, vol. 12, no. SUPPL. 8, S8. https://doi.org/10.1186/1471-2105-12-S8-S8
Chatr-aryamontri, Andrew ; Winter, Andrew ; Perfetto, Livia ; Briganti, Leonardo ; Licata, Luana ; Iannuccelli, Marta ; Castagnoli, Luisa ; Cesareni, Gianni ; Tyers, Mike. / Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases. In: BMC Bioinformatics. 2011 ; Vol. 12, No. SUPPL. 8.
@article{81ff6e5ca28541c9a445e8206e2599bb,
title = "Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases",
abstract = "Background: The vast amount of data published in the primary biomedical literature represents a challenge for the automated extraction and codification of individual data elements. Biological databases that rely solely on manual extraction by expert curators are unable to comprehensively annotate the information dispersed across the entire biomedical literature. The development of efficient tools based on natural language processing (NLP) systems is essential for the selection of relevant publications, identification of data attributes and partially automated annotation. One of the tasks of the Biocreative 2010 Challenge III was devoted to the evaluation of NLP systems developed to identify articles for curation and extraction of protein-protein interaction (PPI) data.Results: The Biocreative 2010 competition addressed three tasks: gene normalization, article classification and interaction method identification. The BioGRID and MINT protein interaction databases both participated in the generation of the test publication set for gene normalization, annotated the development and test sets for article classification, and curated the test set for interaction method classification. These test datasets served as a gold standard for the evaluation of data extraction algorithms.Conclusion: The development of efficient tools for extraction of PPI data is a necessary step to achieve full curation of the biomedical literature. NLP systems can in the first instance facilitate expert curation by refining the list of candidate publications that contain PPI data; more ambitiously, NLP approaches may be able to directly extract relevant information from full-text articles for rapid inspection by expert curators. Close collaboration between biological databases and NLP systems developers will continue to facilitate the long-term objectives of both disciplines.",
author = "Andrew Chatr-aryamontri and Andrew Winter and Livia Perfetto and Leonardo Briganti and Luana Licata and Marta Iannuccelli and Luisa Castagnoli and Gianni Cesareni and Mike Tyers",
year = "2011",
month = "10",
day = "3",
doi = "10.1186/1471-2105-12-S8-S8",
language = "English",
volume = "12",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central Ltd.",
number = "SUPPL. 8",

}

TY - JOUR

T1 - Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases

AU - Chatr-aryamontri, Andrew

AU - Winter, Andrew

AU - Perfetto, Livia

AU - Briganti, Leonardo

AU - Licata, Luana

AU - Iannuccelli, Marta

AU - Castagnoli, Luisa

AU - Cesareni, Gianni

AU - Tyers, Mike

PY - 2011/10/3

Y1 - 2011/10/3

N2 - Background: The vast amount of data published in the primary biomedical literature represents a challenge for the automated extraction and codification of individual data elements. Biological databases that rely solely on manual extraction by expert curators are unable to comprehensively annotate the information dispersed across the entire biomedical literature. The development of efficient tools based on natural language processing (NLP) systems is essential for the selection of relevant publications, identification of data attributes and partially automated annotation. One of the tasks of the Biocreative 2010 Challenge III was devoted to the evaluation of NLP systems developed to identify articles for curation and extraction of protein-protein interaction (PPI) data.Results: The Biocreative 2010 competition addressed three tasks: gene normalization, article classification and interaction method identification. The BioGRID and MINT protein interaction databases both participated in the generation of the test publication set for gene normalization, annotated the development and test sets for article classification, and curated the test set for interaction method classification. These test datasets served as a gold standard for the evaluation of data extraction algorithms.Conclusion: The development of efficient tools for extraction of PPI data is a necessary step to achieve full curation of the biomedical literature. NLP systems can in the first instance facilitate expert curation by refining the list of candidate publications that contain PPI data; more ambitiously, NLP approaches may be able to directly extract relevant information from full-text articles for rapid inspection by expert curators. Close collaboration between biological databases and NLP systems developers will continue to facilitate the long-term objectives of both disciplines.

AB - Background: The vast amount of data published in the primary biomedical literature represents a challenge for the automated extraction and codification of individual data elements. Biological databases that rely solely on manual extraction by expert curators are unable to comprehensively annotate the information dispersed across the entire biomedical literature. The development of efficient tools based on natural language processing (NLP) systems is essential for the selection of relevant publications, identification of data attributes and partially automated annotation. One of the tasks of the Biocreative 2010 Challenge III was devoted to the evaluation of NLP systems developed to identify articles for curation and extraction of protein-protein interaction (PPI) data.Results: The Biocreative 2010 competition addressed three tasks: gene normalization, article classification and interaction method identification. The BioGRID and MINT protein interaction databases both participated in the generation of the test publication set for gene normalization, annotated the development and test sets for article classification, and curated the test set for interaction method classification. These test datasets served as a gold standard for the evaluation of data extraction algorithms.Conclusion: The development of efficient tools for extraction of PPI data is a necessary step to achieve full curation of the biomedical literature. NLP systems can in the first instance facilitate expert curation by refining the list of candidate publications that contain PPI data; more ambitiously, NLP approaches may be able to directly extract relevant information from full-text articles for rapid inspection by expert curators. Close collaboration between biological databases and NLP systems developers will continue to facilitate the long-term objectives of both disciplines.

UR - http://www.scopus.com/inward/record.url?scp=80053430329&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053430329&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-12-S8-S8

DO - 10.1186/1471-2105-12-S8-S8

M3 - Article

C2 - 22151178

AN - SCOPUS:80053430329

VL - 12

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - SUPPL. 8

M1 - S8

ER -