MWI-sum: A multilingual summarizer based on frequent weighted itemsets

Elena Baralis, Luca Cagliero, Alessandro Fiori, Paolo Garza

Research output: Contribution to journalArticlepeer-review

Abstract

Multidocument summarization addresses the selection of a compact subset of highly informative sentences, i.e., the summary, from a collection of textual documents. To perform sentence selection, two parallel strategies have been proposed: (a) apply general-purpose techniques relying on datamining or information retrieval techniques, and/or (b) perform advanced linguistic analysis relying on semantics-based models (e.g., ontologies) to capture the actual sentence meaning. Since there is an increasing need for processing documents written in different languages, the attention of the research community has recently focused on summarizers based on strategy (a). This article presents a novelmultilingual summarizer, namely MWI-Sum (Multilingual Weighted Itemsetbased Summarizer), that exploits an itemset-based model to summarize collections of documents ranging over the same topic. Unlike previous approaches, it extracts frequent weighted itemsets tailored to the analyzed collection and uses them to drive the sentence selection process. Weighted itemsets represent correlations among multiple highly relevant terms that are neglected by previous approaches. The proposed approach makes minimal use of language-dependent analyses. Thus, it is easily applicable to document collections written in different languages. Experiments performed on benchmark and real-life collections, English-written and not, demonstrate that the proposed approach performs better than state-of-the-art multilingual document summarizers.

Original languageEnglish
Article number5
JournalACM Transactions on Information Systems
Volume34
Issue number1
DOIs
Publication statusPublished - Sep 1 2015

Keywords

  • Frequent weighted itemset mining
  • Multilingual summarization
  • Text mining

ASJC Scopus subject areas

  • Information Systems
  • Business, Management and Accounting(all)
  • Computer Science Applications

Fingerprint Dive into the research topics of 'MWI-sum: A multilingual summarizer based on frequent weighted itemsets'. Together they form a unique fingerprint.

Cite this