Influenza-like illness surveillance on twitter through automated learning of naïve language

Research output: Contribution to journalArticle

33 Citations (Scopus)

Abstract

Twitter has the potential to be a timely and cost-effective source of data for syndromic surveillance. When speaking of an illness, Twitter users often report a combination of symptoms, rather than a suspected or final diagnosis, using naïve, everyday language. We developed a minimally trained algorithm that exploits the abundance of health-related web pages to identify all jargon expressions related to a specific technical term. We then translated an influenza case definition into a Boolean query, each symptom being described by a technical term and all related jargon expressions, as identified by the algorithm. Subsequently, we monitored all tweets that reported a combination of symptoms satisfying the case definition query. In order to geolocalize messages, we defined 3 localization strategies based on codes associated with each tweet. We found a high correlation coefficient between the trend of our influenza-positive tweets and ILI trends identified by US traditional surveillance systems.

Original languageEnglish
Article numbere82489
JournalPLoS One
Volume8
Issue number12
DOIs
Publication statusPublished - Dec 4 2013

Fingerprint

influenza
Human Influenza
signs and symptoms (animals and humans)
Language
learning
Learning
monitoring
Information Storage and Retrieval
Websites
Health
Costs and Cost Analysis
Costs

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

@article{391627f21c984ad59342bd6eb5c5cd65,
title = "Influenza-like illness surveillance on twitter through automated learning of na{\"i}ve language",
abstract = "Twitter has the potential to be a timely and cost-effective source of data for syndromic surveillance. When speaking of an illness, Twitter users often report a combination of symptoms, rather than a suspected or final diagnosis, using na{\"i}ve, everyday language. We developed a minimally trained algorithm that exploits the abundance of health-related web pages to identify all jargon expressions related to a specific technical term. We then translated an influenza case definition into a Boolean query, each symptom being described by a technical term and all related jargon expressions, as identified by the algorithm. Subsequently, we monitored all tweets that reported a combination of symptoms satisfying the case definition query. In order to geolocalize messages, we defined 3 localization strategies based on codes associated with each tweet. We found a high correlation coefficient between the trend of our influenza-positive tweets and ILI trends identified by US traditional surveillance systems.",
author = "Francesco Gesualdo and Giovanni Stilo and Eleonora Agricola and Gonfiantini, {Michaela V.} and Elisabetta Pandolfi and Paola Velardi and Tozzi, {Alberto E.}",
year = "2013",
month = "12",
day = "4",
doi = "10.1371/journal.pone.0082489",
language = "English",
volume = "8",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "12",

}

TY - JOUR

T1 - Influenza-like illness surveillance on twitter through automated learning of naïve language

AU - Gesualdo, Francesco

AU - Stilo, Giovanni

AU - Agricola, Eleonora

AU - Gonfiantini, Michaela V.

AU - Pandolfi, Elisabetta

AU - Velardi, Paola

AU - Tozzi, Alberto E.

PY - 2013/12/4

Y1 - 2013/12/4

N2 - Twitter has the potential to be a timely and cost-effective source of data for syndromic surveillance. When speaking of an illness, Twitter users often report a combination of symptoms, rather than a suspected or final diagnosis, using naïve, everyday language. We developed a minimally trained algorithm that exploits the abundance of health-related web pages to identify all jargon expressions related to a specific technical term. We then translated an influenza case definition into a Boolean query, each symptom being described by a technical term and all related jargon expressions, as identified by the algorithm. Subsequently, we monitored all tweets that reported a combination of symptoms satisfying the case definition query. In order to geolocalize messages, we defined 3 localization strategies based on codes associated with each tweet. We found a high correlation coefficient between the trend of our influenza-positive tweets and ILI trends identified by US traditional surveillance systems.

AB - Twitter has the potential to be a timely and cost-effective source of data for syndromic surveillance. When speaking of an illness, Twitter users often report a combination of symptoms, rather than a suspected or final diagnosis, using naïve, everyday language. We developed a minimally trained algorithm that exploits the abundance of health-related web pages to identify all jargon expressions related to a specific technical term. We then translated an influenza case definition into a Boolean query, each symptom being described by a technical term and all related jargon expressions, as identified by the algorithm. Subsequently, we monitored all tweets that reported a combination of symptoms satisfying the case definition query. In order to geolocalize messages, we defined 3 localization strategies based on codes associated with each tweet. We found a high correlation coefficient between the trend of our influenza-positive tweets and ILI trends identified by US traditional surveillance systems.

UR - http://www.scopus.com/inward/record.url?scp=84891813761&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84891813761&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0082489

DO - 10.1371/journal.pone.0082489

M3 - Article

C2 - 24324799

AN - SCOPUS:84891813761

VL - 8

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 12

M1 - e82489

ER -