GraphSum: Discovering correlations among multiple terms for graph-based summarization

Elena Baralis, Luca Cagliero, Naeem Mahoto, Alessandro Fiori

Research output: Contribution to journalArticle

Abstract

Graph-based summarization entails extracting a worthwhile subset of sentences from a collection of textual documents by using a graph-based model to represent the correlations between pairs of document terms. However, since the high-order correlations among multiple terms are disregarded during graph evaluation, the summarization performance could be limited unless integrating ad hoc language-dependent or semantics-based analysis. This paper presents a novel and general-purpose graph-based summarizer, namely GraphSum (Graph-based Summarizer). It discovers and exploits association rules to represent the correlations among multiple terms that have been neglected by previous approaches. The graph nodes, which represent combinations of two or more terms, are first ranked by means of a PageRank strategy that discriminates between positive and negative term correlations. Then, the produced node ranking is used to drive the sentence selection process. The experiments performed on benchmark and real-life documents demonstrate the effectiveness of the proposed approach compared to many state-of-the-art summarizers.

Original languageEnglish
Pages (from-to)96-109
Number of pages14
JournalInformation Sciences
Volume249
DOIs
Publication statusPublished - Nov 10 2013

Keywords

  • Association rule mining
  • Graph ranking
  • Multi-document summarization
  • Text mining

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management

Fingerprint Dive into the research topics of 'GraphSum: Discovering correlations among multiple terms for graph-based summarization'. Together they form a unique fingerprint.

  • Cite this