News document summarization driven by user-generated content

Luca Cagliero, Alessandro Fiori

Research output: Chapter in Book/Report/Conference proceedingChapter


The outstanding growth of the Internet has made available to analysts a huge and increasing amount of Web documents (e.g., news articles) and user-generated content (e.g., social network posts) coming from social networks and online communities that are worth considering together. On one hand, the need of novel and more effective approaches to summarize Web document collections makes the application of data mining techniques established in different research contexts more and more appealing. On the other hand, to generate appealing summaries the data mining and knowledge discovery process cannot disregard the major Web users' interests. This chapter presents a novel news document summarization system, namely NeDocS, that focuses on generating succinct, not redundant, yet appealing summaries by means of a data mining and knowledge discovery process driven by messages posted on social networks. NeDocS retrieves from the Web and summarizes news document collections by exploiting (1) frequent itemsets, i.e., recurrences that frequently occur in the analyzed data, to capture most significant correlations among terms and (2) a sentence relevance evaluator that takes into account term significance in a collection of social network posts ranging over the same news topics. This approach allows not disregarding sentences whose terms rarely occur in the news collection but are deemed relevant by Web users. To the best of our knowledge, the combined usage of frequent itemsets and user-generated content in news document summarization is an appealing research direction that has never been investigated so far. Experiments performed on real collections of news articles and driven by on-topic Twitter posts show the effectiveness of the proposed approach.

Original languageEnglish
Title of host publicationSocial Media Mining and Social Network Analysis: Emerging Research
PublisherIGI Global
Number of pages22
ISBN (Print)9781466628069
Publication statusPublished - 2013

ASJC Scopus subject areas

  • Computer Science(all)
  • Social Sciences(all)
  • Engineering(all)
  • Earth and Planetary Sciences(all)


Dive into the research topics of 'News document summarization driven by user-generated content'. Together they form a unique fingerprint.

Cite this