The outstanding growth of the Internet has made available to analysts a huge and increasing amount of Web documents (e.g., news articles) and user-generated content (e.g., social network posts) coming from social networks and online communities that are worth considering together. On one hand, the need of novel and more effective approaches to summarize Web document collections makes the application of data mining techniques established in different research contexts more and more appealing. On the other hand, to generate appealing summaries the data mining and knowledge discovery process cannot disregard the major Web users' interests. This chapter presents a novel news document summarization system, namely NeDocS, that focuses on generating succinct, not redundant, yet appealing summaries by means of a data mining and knowledge discovery process driven by messages posted on social networks. NeDocS retrieves from the Web and summarizes news document collections by exploiting (1) frequent itemsets, i.e., recurrences that frequently occur in the analyzed data, to capture most significant correlations among terms and (2) a sentence relevance evaluator that takes into account term significance in a collection of social network posts ranging over the same news topics. This approach allows not disregarding sentences whose terms rarely occur in the news collection but are deemed relevant by Web users. To the best of our knowledge, the combined usage of frequent itemsets and user-generated content in news document summarization is an appealing research direction that has never been investigated so far. Experiments performed on real collections of news articles and driven by on-topic Twitter posts show the effectiveness of the proposed approach.
ASJC Scopus subject areas
- Computer Science(all)
- Social Sciences(all)
- Earth and Planetary Sciences(all)