Data-Driven Information Extraction from Chinese Electronic Medical Records

OBJECTIVE: This study aims to propose a data-driven framework that takes unstructured free text narratives in Chinese Electronic Medical Records (EMRs) as input and converts them into structured time-event-description triples, where the description is either an elaboration or an outcome of the medical event.

MATERIALS AND METHODS: Our framework uses a hybrid approach. It consists of constructing cross-domain core medical lexica, an unsupervised, iterative algorithm to accrue more accurate terms into the lexica, rules to address Chinese writing conventions and temporal descriptors, and a Support Vector Machine (SVM) algorithm that innovatively utilizes Normalized Google Distance (NGD) to estimate the correlation between medical events and their descriptions.

RESULTS: The effectiveness of the framework was demonstrated with a dataset of 24,817 de-identified Chinese EMRs. The cross-domain medical lexica were capable of recognizing terms with an F1-score of 0.896. 98.5% of recorded medical events were linked to temporal descriptors. The NGD SVM description-event matching achieved an F1-score of 0.874. The end-to-end time-event-description extraction of our framework achieved an F1-score of 0.846.

DISCUSSION: In terms of named entity recognition, the proposed framework outperforms state-of-the-art supervised learning algorithms (F1-score: 0.896 vs. 0.886). In event-description association, the NGD SVM is superior to SVM using only local context and semantic features (F1-score: 0.874 vs. 0.838).

CONCLUSIONS: The framework is data-driven, weakly supervised, and robust against the variations and noises that tend to occur in a large corpus. It addresses Chinese medical writing conventions and variations in writing styles through patterns used for discovering new terms and rules for updating the lexica.

Medienart:

E-Artikel

Erscheinungsjahr:

2015

Erschienen:

2015

Enthalten in:

Zur Gesamtaufnahme - volume:10

Enthalten in:

PloS one - 10(2015), 8 vom: 16., Seite e0136270

Sprache:

Englisch

Beteiligte Personen:

Xu, Dong [VerfasserIn]
Zhang, Meizhuo [VerfasserIn]
Zhao, Tianwan [VerfasserIn]
Ge, Chen [VerfasserIn]
Gao, Weiguo [VerfasserIn]
Wei, Jia [VerfasserIn]
Zhu, Kenny Q [VerfasserIn]

Links:

Volltext

Themen:

Journal Article

Anmerkungen:

Date Completed 13.05.2016

Date Revised 13.11.2018

published: Electronic-eCollection

Citation Status MEDLINE

doi:

10.1371/journal.pone.0136270

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM25204763X