Twitter mining for fine-grained syndromic surveillance

Paola Velardi, Giovanni Stilo, Alberto E. Tozzi, Francesco Gesualdo

Research output: Contribution to journalArticlepeer-review


Background: Digital traces left on the Internet by web users, if properly aggregated and analyzed, can represent a huge information dataset able to inform syndromic surveillance systems in real time with data collected directly from individuals. Since people use everyday language rather than medical jargon (e.g. runny nose vs. respiratory distress), knowledge of patients' terminology is essential for the mining of health related conversations on social networks. Objectives: In this paper we present a methodology for early detection and analysis of epidemics based on mining Twitter messages. In order to reliably trace messages of patients that actually complain of a disease, first, we learn a model of naïve medical language, second, we adopt a symptom-driven, rather than disease-driven, keyword analysis. This approach represents a major innovation compared to previous published work in the field. Method: We first developed an algorithm to automatically learn a variety of expressions that people use to describe their health conditions, thus improving our ability to detect health-related "concepts" expressed in non-medical terms and, in the end, producing a larger body of evidence. We then implemented a Twitter monitoring instrument to finely analyze the presence and combinations of symptoms in tweets. Results: We first evaluate the algorithm's performance on an available dataset of diverse medical condition synonyms, then, we assess its utility in a case study of five common syndromes for surveillance purposes. We show that, by exploiting physicians' knowledge on symptoms positively or negatively related to a given disease, as well as the correspondence between patients' "naïve" terminology and medical jargon, not only can we analyze large volumes of Twitter messages related to that disease, but we can also mine micro-blogs with complex queries, performing fine-grained tweets classification (e.g. those reporting influenza-like illness (ILI) symptoms vs. common cold or allergy). Conclusions: Our approach yields a very high level of correlation with flu trends derived from traditional surveillance systems. Compared with Google Flu, another popular tool based on query search volumes, our method is more flexible and less sensitive to changes in web search behaviors.

Original languageEnglish
Pages (from-to)153-163
Number of pages11
JournalArtificial Intelligence in Medicine
Issue number3
Publication statusPublished - 2014


  • Micro-blog mining
  • Patient's language learning
  • Syndromic surveillance
  • Terminology clustering
  • Twitter mining

ASJC Scopus subject areas

  • Artificial Intelligence
  • Medicine (miscellaneous)
  • Medicine(all)


Dive into the research topics of 'Twitter mining for fine-grained syndromic surveillance'. Together they form a unique fingerprint.

Cite this