Multi-document summarization exploiting frequent itemsets

Elena Baralis, Luca Cagliero, Saima Jabeen, Alessandro Fiori

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A summary is a succinct and informative description of a data collection. In the context of multi-document summarization, the selection of the most relevant and not redundant sentences belonging to a collection of textual documents is definitely a challenging task. Frequent itemset mining is a well-established data mining technique to discover correlations among data. Although it has been widely used in transactional data analysis, to the best of our knowledge, its exploitation in document summarization has never been investigated so far. This paper presents a novel multi-document summarizer, namely ItemSum (Itemset-based Summarizer), that is based on an itemset-based model, i.e., a model composed of frequent itemsets, extracted from the document collection. It automatically selects the most representative and not redundant sentences to include in the summary by considering both sentence coverage, with respect to a concise and highly informative itemset-based model, and a sentence relevance score, based on tf-idf statistics. Experimental results, performed on the DUC'04 document collection by means of ROUGE toolkit, show that the proposed approach achieves better performance than a large set of competitors.

Original languageEnglish
Title of host publicationProceedings of the ACM Symposium on Applied Computing
Pages782-786
Number of pages5
DOIs
Publication statusPublished - 2012
Event27th Annual ACM Symposium on Applied Computing, SAC 2012 - Trento, Italy
Duration: Mar 26 2012Mar 30 2012

Other

Other27th Annual ACM Symposium on Applied Computing, SAC 2012
CountryItaly
CityTrento
Period3/26/123/30/12

Keywords

  • frequent itemset mining
  • multi-document summarization
  • text mining

ASJC Scopus subject areas

  • Software

Fingerprint Dive into the research topics of 'Multi-document summarization exploiting frequent itemsets'. Together they form a unique fingerprint.

Cite this