Details der Publikation - De-identifying Spanish medical texts - named entity recognition applied to radiology reports

De-identifying Spanish medical texts - named entity recognition applied to radiology reports

BACKGROUND: Medical texts such as radiology reports or electronic health records are a powerful source of data for researchers. Anonymization methods must be developed to de-identify documents containing personal information from both patients and medical staff. Although currently there are several anonymization strategies for the English language, they are also language-dependent. Here, we introduce a named entity recognition strategy for Spanish medical texts, translatable to other languages.

RESULTS: We tested 4 neural networks on our radiology reports dataset, achieving a recall of 97.18% of the identifying entities. Alongside, we developed a randomization algorithm to substitute the detected entities with new ones from the same category, making it virtually impossible to differentiate real data from synthetic data. The three best architectures were tested with the MEDDOCAN challenge dataset of electronic health records as an external test, achieving a recall of 69.18%.

CONCLUSIONS: The strategy proposed, combining named entity recognition tasks with randomization of entities, is suitable for Spanish radiology reports. It does not require a big training corpus, thus it could be easily extended to other languages and medical texts, such as electronic health records.

Medienart:	E-Artikel

Erscheinungsjahr:	2021
Erschienen:	2021

Enthalten in:	Zur Gesamtaufnahme - volume:12
Enthalten in:	Journal of biomedical semantics - 12(2021), 1 vom: 29. März, Seite 6

Sprache:	Englisch

Beteiligte Personen:	Pérez-Díez, Irene [VerfasserIn] Pérez-Moraga, Raúl [VerfasserIn] López-Cerdán, Adolfo [VerfasserIn] Salinas-Serrano, Jose-Maria [VerfasserIn] la Iglesia-Vayá, María de [VerfasserIn]

Links:	Volltext

Themen:	Journal Article Medical texts Named entity recognition Natural language processing Radiology reports Research Support, Non-U.S. Gov't Spanish

Anmerkungen:	Date Completed 28.10.2021 Date Revised 31.03.2024 published: Electronic Citation Status MEDLINE

doi:	10.1186/s13326-021-00236-2

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	NLM323371485

Internformat


LEADER	01000caa a22002652 4500
001	NLM323371485
003	DE-627
005	20240331233138.0
007	cr uuu---uuuuu
008	231225s2021 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1186/s13326-021-00236-2 \|2 doi
028	5	2	\|a pubmed24n1358.xml
035			\|a (DE-627)NLM323371485
035			\|a (NLM)33781334
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Pérez-Díez, Irene \|e verfasserin \|4 aut
245	1	0	\|a De-identifying Spanish medical texts - named entity recognition applied to radiology reports
264		1	\|c 2021
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 28.10.2021
500			\|a Date Revised 31.03.2024
500			\|a published: Electronic
500			\|a Citation Status MEDLINE
520			\|a BACKGROUND: Medical texts such as radiology reports or electronic health records are a powerful source of data for researchers. Anonymization methods must be developed to de-identify documents containing personal information from both patients and medical staff. Although currently there are several anonymization strategies for the English language, they are also language-dependent. Here, we introduce a named entity recognition strategy for Spanish medical texts, translatable to other languages
520			\|a RESULTS: We tested 4 neural networks on our radiology reports dataset, achieving a recall of 97.18% of the identifying entities. Alongside, we developed a randomization algorithm to substitute the detected entities with new ones from the same category, making it virtually impossible to differentiate real data from synthetic data. The three best architectures were tested with the MEDDOCAN challenge dataset of electronic health records as an external test, achieving a recall of 69.18%
520			\|a CONCLUSIONS: The strategy proposed, combining named entity recognition tasks with randomization of entities, is suitable for Spanish radiology reports. It does not require a big training corpus, thus it could be easily extended to other languages and medical texts, such as electronic health records
650		4	\|a Journal Article
650		4	\|a Research Support, Non-U.S. Gov't
650		4	\|a Medical texts
650		4	\|a Named entity recognition
650		4	\|a Natural language processing
650		4	\|a Radiology reports
650		4	\|a Spanish
700	1		\|a Pérez-Moraga, Raúl \|e verfasserin \|4 aut
700	1		\|a López-Cerdán, Adolfo \|e verfasserin \|4 aut
700	1		\|a Salinas-Serrano, Jose-Maria \|e verfasserin \|4 aut
700	1		\|a la Iglesia-Vayá, María de \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t Journal of biomedical semantics \|d 2010 \|g 12(2021), 1 vom: 29. März, Seite 6 \|w (DE-627)NLM199466343 \|x 2041-1480 \|7 nnns
773	1	8	\|g volume:12 \|g year:2021 \|g number:1 \|g day:29 \|g month:03 \|g pages:6
856	4	0	\|u http://dx.doi.org/10.1186/s13326-021-00236-2 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a GBV_NLM
951			\|a AR
952			\|d 12 \|j 2021 \|e 1 \|b 29 \|c 03 \|h 6

De-identifying Spanish medical texts - named entity recognition applied to radiology reports

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände