Abstract
A summary is a succinct and informative description of a data collection. In the context of multi-document summarization, the selection of the most relevant and not redundant sentences belonging to a collection of textual documents is definitely a challenging task. Frequent itemset mining is a well-established data mining technique to discover correlations among data. Although it has been widely used in transactional data analysis, to the best of our knowledge, its exploitation in document summarization has never been investigated so far. This paper presents a novel multi-document summarizer, namely ItemSum (Itemset-based Summarizer), that is based on an itemset-based model, i.e., a model composed of frequent itemsets, extracted from the document collection. It automatically selects the most representative and not redundant sentences to include in the summary by considering both sentence coverage, with respect to a concise and highly informative itemset-based model, and a sentence relevance score, based on tf-idf statistics. Experimental results, performed on the DUC'04 document collection by means of ROUGE toolkit, show that the proposed approach achieves better performance than a large set of competitors.
Original language | English |
---|---|
Title of host publication | Proceedings of the ACM Symposium on Applied Computing |
Pages | 782-786 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 2012 |
Event | 27th Annual ACM Symposium on Applied Computing, SAC 2012 - Trento, Italy Duration: Mar 26 2012 → Mar 30 2012 |
Other
Other | 27th Annual ACM Symposium on Applied Computing, SAC 2012 |
---|---|
Country/Territory | Italy |
City | Trento |
Period | 3/26/12 → 3/30/12 |
Keywords
- frequent itemset mining
- multi-document summarization
- text mining
ASJC Scopus subject areas
- Software