The Revival of the Notes Field: Leveraging the Unstructured Content in Electronic Health Records

Michela Assale, Linda Greta Dui, Andrea Cina, Andrea Seveso, Federico Cabitza

Research output: Contribution to journalArticle

Abstract

Problem: Clinical practice requires the production of a time- and resource-consuming great amount of notes. They contain relevant information, but their secondary use is almost impossible, due to their unstructured nature. Researchers are trying to address this problems, with traditional and promising novel techniques. Application in real hospital settings seems not to be possible yet, though, both because of relatively small and dirty dataset, and for the lack of language-specific pre-trained models. Aim: Our aim is to demonstrate the potential of the above techniques, but also raise awareness of the still open challenges that the scientific communities of IT and medical practitioners must jointly address to realize the full potential of unstructured content that is daily produced and digitized in hospital settings, both to improve its data quality and leverage the insights from data-driven predictive models. Methods: To this extent, we present a narrative literature review of the most recent and relevant contributions to leverage the application of Natural Language Processing techniques to the free-text content electronic patient records. In particular, we focused on four selected application domains, namely: data quality, information extraction, sentiment analysis and predictive models, and automated patient cohort selection. Then, we will present a few empirical studies that we undertook at a major teaching hospital specializing in musculoskeletal diseases. Results: We provide the reader with some simple and affordable pipelines, which demonstrate the feasibility of reaching literature performance levels with a single institution non-English dataset. In such a way, we bridged literature and real world needs, performing a step further toward the revival of notes fields.
Original languageUndefined/Unknown
JournalFrontiers in Medicine
Volume6
DOIs
Publication statusPublished - Apr 17 2019

Cite this

The Revival of the Notes Field: Leveraging the Unstructured Content in Electronic Health Records. / Assale, Michela; Dui, Linda Greta; Cina, Andrea; Seveso, Andrea; Cabitza, Federico.

In: Frontiers in Medicine, Vol. 6, 17.04.2019.

Research output: Contribution to journalArticle

Assale, Michela ; Dui, Linda Greta ; Cina, Andrea ; Seveso, Andrea ; Cabitza, Federico. / The Revival of the Notes Field: Leveraging the Unstructured Content in Electronic Health Records. In: Frontiers in Medicine. 2019 ; Vol. 6.
@article{2c66bd6a02e246b4b25516811ae5160a,
title = "The Revival of the Notes Field: Leveraging the Unstructured Content in Electronic Health Records",
abstract = "Problem: Clinical practice requires the production of a time- and resource-consuming great amount of notes. They contain relevant information, but their secondary use is almost impossible, due to their unstructured nature. Researchers are trying to address this problems, with traditional and promising novel techniques. Application in real hospital settings seems not to be possible yet, though, both because of relatively small and dirty dataset, and for the lack of language-specific pre-trained models. Aim: Our aim is to demonstrate the potential of the above techniques, but also raise awareness of the still open challenges that the scientific communities of IT and medical practitioners must jointly address to realize the full potential of unstructured content that is daily produced and digitized in hospital settings, both to improve its data quality and leverage the insights from data-driven predictive models. Methods: To this extent, we present a narrative literature review of the most recent and relevant contributions to leverage the application of Natural Language Processing techniques to the free-text content electronic patient records. In particular, we focused on four selected application domains, namely: data quality, information extraction, sentiment analysis and predictive models, and automated patient cohort selection. Then, we will present a few empirical studies that we undertook at a major teaching hospital specializing in musculoskeletal diseases. Results: We provide the reader with some simple and affordable pipelines, which demonstrate the feasibility of reaching literature performance levels with a single institution non-English dataset. In such a way, we bridged literature and real world needs, performing a step further toward the revival of notes fields.",
author = "Michela Assale and Dui, {Linda Greta} and Andrea Cina and Andrea Seveso and Federico Cabitza",
year = "2019",
month = "4",
day = "17",
doi = "10.3389/fmed.2019.00066",
language = "Non definita",
volume = "6",
journal = "Frontiers of Medicine",
issn = "2095-0217",
publisher = "Springer Science + Business Media",

}

TY - JOUR

T1 - The Revival of the Notes Field: Leveraging the Unstructured Content in Electronic Health Records

AU - Assale, Michela

AU - Dui, Linda Greta

AU - Cina, Andrea

AU - Seveso, Andrea

AU - Cabitza, Federico

PY - 2019/4/17

Y1 - 2019/4/17

N2 - Problem: Clinical practice requires the production of a time- and resource-consuming great amount of notes. They contain relevant information, but their secondary use is almost impossible, due to their unstructured nature. Researchers are trying to address this problems, with traditional and promising novel techniques. Application in real hospital settings seems not to be possible yet, though, both because of relatively small and dirty dataset, and for the lack of language-specific pre-trained models. Aim: Our aim is to demonstrate the potential of the above techniques, but also raise awareness of the still open challenges that the scientific communities of IT and medical practitioners must jointly address to realize the full potential of unstructured content that is daily produced and digitized in hospital settings, both to improve its data quality and leverage the insights from data-driven predictive models. Methods: To this extent, we present a narrative literature review of the most recent and relevant contributions to leverage the application of Natural Language Processing techniques to the free-text content electronic patient records. In particular, we focused on four selected application domains, namely: data quality, information extraction, sentiment analysis and predictive models, and automated patient cohort selection. Then, we will present a few empirical studies that we undertook at a major teaching hospital specializing in musculoskeletal diseases. Results: We provide the reader with some simple and affordable pipelines, which demonstrate the feasibility of reaching literature performance levels with a single institution non-English dataset. In such a way, we bridged literature and real world needs, performing a step further toward the revival of notes fields.

AB - Problem: Clinical practice requires the production of a time- and resource-consuming great amount of notes. They contain relevant information, but their secondary use is almost impossible, due to their unstructured nature. Researchers are trying to address this problems, with traditional and promising novel techniques. Application in real hospital settings seems not to be possible yet, though, both because of relatively small and dirty dataset, and for the lack of language-specific pre-trained models. Aim: Our aim is to demonstrate the potential of the above techniques, but also raise awareness of the still open challenges that the scientific communities of IT and medical practitioners must jointly address to realize the full potential of unstructured content that is daily produced and digitized in hospital settings, both to improve its data quality and leverage the insights from data-driven predictive models. Methods: To this extent, we present a narrative literature review of the most recent and relevant contributions to leverage the application of Natural Language Processing techniques to the free-text content electronic patient records. In particular, we focused on four selected application domains, namely: data quality, information extraction, sentiment analysis and predictive models, and automated patient cohort selection. Then, we will present a few empirical studies that we undertook at a major teaching hospital specializing in musculoskeletal diseases. Results: We provide the reader with some simple and affordable pipelines, which demonstrate the feasibility of reaching literature performance levels with a single institution non-English dataset. In such a way, we bridged literature and real world needs, performing a step further toward the revival of notes fields.

U2 - 10.3389/fmed.2019.00066

DO - 10.3389/fmed.2019.00066

M3 - Articolo

VL - 6

JO - Frontiers of Medicine

JF - Frontiers of Medicine

SN - 2095-0217

ER -