In this chapter we present the analysis of the Wikipedia collection by means of the ELiDa framework with the aim of enriching linked data. ELiDa is based on association rule mining, an exploratory technique to discover relevant correlations hidden in the analyzed data. To compactly store the large volume of extracted knowledge and efficiently retrieve it for further analysis, a persistent structure has been exploited. The domain expert is in charge of selecting the relevant knowledge by setting filtering parameters, assessing the quality of the extracted knowledge, and enriching the knowledge with the semantic expressiveness which cannot be automatically inferred. We consider, as representative document collections, seven datasets extracted from the Wikipedia collection. Each dataset has been analyzed from two point of views (i.e., transactions by documents, transactions by sentences) to highlight relevant knowledge at different levels of abstraction.
ASJC Scopus subject areas
- Computer Science(all)