Predicting flu epidemics using Twitter and historical data

Giovanni Stilo, Paola Velardi, Alberto E. Tozzi, Francesco Gesualdo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Recently there has been a growing attention on the use of web and social data to improve traditional prediction models in politics, finance, marketing and health, but even though a correlation between observed phenomena and related social data has been demonstrated in many cases, yet the effectiveness of the latter for long-term or even mid-term predictions has not been shown. In epidemiological surveillance, the problem is compounded by the fact that infectious diseases models (such as susceptible-infected-recovered-susceptible, SIRS) are very sensitive to current conditions, such that small changes can produce remarkable differences in future outcomes. Unfortunately, current or nearly-current conditions keep changing as data are collected and updated by the epidemiological surveillance organizations. In this paper we show that the time series of Twitter messages reporting a combination of symptoms that match the influenza-like-illness (ILI) case definition represent a more stable and reliable information on "current conditions", to the point that they can replace, rather than simply integrate, official epidemiological data. We estimate the effectiveness of these data at predicting current and past flu seasons (17 seasons overall), in combination with official historical data on past seasons, obtaining an average correlation of 0.85 over a period of 17 weeks covering the flu season.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages164-177
Number of pages14
Volume8609 LNAI
ISBN (Print)9783319098906
DOIs
Publication statusPublished - 2014
Event2014 International Conference on Brain Informatics and Health, BIH 2014 - Warsaw, Poland
Duration: Aug 11 2014Aug 14 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8609 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other2014 International Conference on Brain Informatics and Health, BIH 2014
CountryPoland
CityWarsaw
Period8/11/148/14/14

Fingerprint

Historical Data
Finance
Marketing
Time series
Surveillance
Health
Influenza
Infectious Diseases
Prediction Model
Covering
Integrate
Prediction
Estimate

Keywords

  • epidemiological surveillance
  • predictability of health-related phenomena
  • Twitter mining

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Stilo, G., Velardi, P., Tozzi, A. E., & Gesualdo, F. (2014). Predicting flu epidemics using Twitter and historical data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8609 LNAI, pp. 164-177). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8609 LNAI). Springer Verlag. https://doi.org/10.1007/978-3-319-09891-3_16

Predicting flu epidemics using Twitter and historical data. / Stilo, Giovanni; Velardi, Paola; Tozzi, Alberto E.; Gesualdo, Francesco.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8609 LNAI Springer Verlag, 2014. p. 164-177 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8609 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Stilo, G, Velardi, P, Tozzi, AE & Gesualdo, F 2014, Predicting flu epidemics using Twitter and historical data. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 8609 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8609 LNAI, Springer Verlag, pp. 164-177, 2014 International Conference on Brain Informatics and Health, BIH 2014, Warsaw, Poland, 8/11/14. https://doi.org/10.1007/978-3-319-09891-3_16
Stilo G, Velardi P, Tozzi AE, Gesualdo F. Predicting flu epidemics using Twitter and historical data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8609 LNAI. Springer Verlag. 2014. p. 164-177. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-09891-3_16
Stilo, Giovanni ; Velardi, Paola ; Tozzi, Alberto E. ; Gesualdo, Francesco. / Predicting flu epidemics using Twitter and historical data. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8609 LNAI Springer Verlag, 2014. pp. 164-177 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{cf806bc12fe04c7ca9a0fd96c4ac5098,
title = "Predicting flu epidemics using Twitter and historical data",
abstract = "Recently there has been a growing attention on the use of web and social data to improve traditional prediction models in politics, finance, marketing and health, but even though a correlation between observed phenomena and related social data has been demonstrated in many cases, yet the effectiveness of the latter for long-term or even mid-term predictions has not been shown. In epidemiological surveillance, the problem is compounded by the fact that infectious diseases models (such as susceptible-infected-recovered-susceptible, SIRS) are very sensitive to current conditions, such that small changes can produce remarkable differences in future outcomes. Unfortunately, current or nearly-current conditions keep changing as data are collected and updated by the epidemiological surveillance organizations. In this paper we show that the time series of Twitter messages reporting a combination of symptoms that match the influenza-like-illness (ILI) case definition represent a more stable and reliable information on {"}current conditions{"}, to the point that they can replace, rather than simply integrate, official epidemiological data. We estimate the effectiveness of these data at predicting current and past flu seasons (17 seasons overall), in combination with official historical data on past seasons, obtaining an average correlation of 0.85 over a period of 17 weeks covering the flu season.",
keywords = "epidemiological surveillance, predictability of health-related phenomena, Twitter mining",
author = "Giovanni Stilo and Paola Velardi and Tozzi, {Alberto E.} and Francesco Gesualdo",
year = "2014",
doi = "10.1007/978-3-319-09891-3_16",
language = "English",
isbn = "9783319098906",
volume = "8609 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "164--177",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Predicting flu epidemics using Twitter and historical data

AU - Stilo, Giovanni

AU - Velardi, Paola

AU - Tozzi, Alberto E.

AU - Gesualdo, Francesco

PY - 2014

Y1 - 2014

N2 - Recently there has been a growing attention on the use of web and social data to improve traditional prediction models in politics, finance, marketing and health, but even though a correlation between observed phenomena and related social data has been demonstrated in many cases, yet the effectiveness of the latter for long-term or even mid-term predictions has not been shown. In epidemiological surveillance, the problem is compounded by the fact that infectious diseases models (such as susceptible-infected-recovered-susceptible, SIRS) are very sensitive to current conditions, such that small changes can produce remarkable differences in future outcomes. Unfortunately, current or nearly-current conditions keep changing as data are collected and updated by the epidemiological surveillance organizations. In this paper we show that the time series of Twitter messages reporting a combination of symptoms that match the influenza-like-illness (ILI) case definition represent a more stable and reliable information on "current conditions", to the point that they can replace, rather than simply integrate, official epidemiological data. We estimate the effectiveness of these data at predicting current and past flu seasons (17 seasons overall), in combination with official historical data on past seasons, obtaining an average correlation of 0.85 over a period of 17 weeks covering the flu season.

AB - Recently there has been a growing attention on the use of web and social data to improve traditional prediction models in politics, finance, marketing and health, but even though a correlation between observed phenomena and related social data has been demonstrated in many cases, yet the effectiveness of the latter for long-term or even mid-term predictions has not been shown. In epidemiological surveillance, the problem is compounded by the fact that infectious diseases models (such as susceptible-infected-recovered-susceptible, SIRS) are very sensitive to current conditions, such that small changes can produce remarkable differences in future outcomes. Unfortunately, current or nearly-current conditions keep changing as data are collected and updated by the epidemiological surveillance organizations. In this paper we show that the time series of Twitter messages reporting a combination of symptoms that match the influenza-like-illness (ILI) case definition represent a more stable and reliable information on "current conditions", to the point that they can replace, rather than simply integrate, official epidemiological data. We estimate the effectiveness of these data at predicting current and past flu seasons (17 seasons overall), in combination with official historical data on past seasons, obtaining an average correlation of 0.85 over a period of 17 weeks covering the flu season.

KW - epidemiological surveillance

KW - predictability of health-related phenomena

KW - Twitter mining

UR - http://www.scopus.com/inward/record.url?scp=84905252466&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84905252466&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-09891-3_16

DO - 10.1007/978-3-319-09891-3_16

M3 - Conference contribution

SN - 9783319098906

VL - 8609 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 164

EP - 177

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -