BACKGROUND: Enrollment of large cohorts of syncope patients from administrative data is crucial for proper risk stratification but is limited by the enormous amount of time required for manual revision of medical records.
AIM: To develop a Natural Language Processing (NLP) algorithm to automatically identify syncope from Emergency Department (ED) electronic medical records (EMRs).
METHODS: De-identified EMRs of all consecutive patients evaluated at Humanitas Research Hospital ED from 1 December 2013 to 31 March 2014 and from 1 December 2015 to 31 March 2016 were manually annotated to identify syncope. Records were combined in a single dataset and classified. The performance of combined multiple NLP feature selectors and classifiers was tested. Primary Outcomes: NLP algorithms' accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F3 score.
RESULTS: 15,098 and 15,222 records from 2013 and 2015 datasets were analyzed. Syncope was present in 571 records. Normalized Gini Index feature selector combined with Support Vector Machines classifier obtained the best F3 value (84.0%), with 92.2% sensitivity and 47.4% positive predictive value. A 96% analysis time reduction was computed, compared with EMRs manual review.
CONCLUSIONS: This artificial intelligence algorithm enabled the automatic identification of a large population of syncope patients using EMRs.