Evaluation of the Automatic Full Form Retrieval Method from Abbreviation Using Word2vec for Terminology Expansion

PURPOSES: The purposes of this study were to automatically extract full forms from abbreviations by using Word2vec for terminology expansion and determine the optimal parameters that ensure the highest accuracy.

METHODS: Approximately 300000 English abstracts on "image diagnosis" were collected using PubMed from January 1994 to December 2018. As preprocessing, all uppercase letters in the collected data were converted to lowercase letters, and symbols were deleted. In addition, compound word recognition was performed using RadLex published by the Radiological Society of North America and the abbreviation collection published by the Japanese Society  of  Radiological  Technology.  Next,  distributed  representations  were  generated  by  two  algorithms, continuous bag-of-words (CBOW) and Skip-gram, by using the following parameters: iteration numbers (3-85) and dimensions of word vectors (50-1000). Abbreviations were input to the generated distributed representations, and full forms with the highest cosine similarities with the abbreviations were identified. Then, the rates of the correct answers were calculated by comparing the predicted full forms to 214 gold standards extracted from the abbreviation collection.

RESULTS: The highest correct answer rate was 74.3% by Skip-gram, 200 dimensions and 10 iterations. This rate was higher in Skip-gram than in CBOW for all the tested conditions.

CONCLUSION: The accuracy of extracting the full forms by Word2vec is 74.3%, and this result contributes to the consistency of a terminology and the efficiency of terminology expansion.

Medienart:

E-Artikel

Erscheinungsjahr:

2020

Erschienen:

2020

Enthalten in:

Zur Gesamtaufnahme - volume:76

Enthalten in:

Nihon Hoshasen Gijutsu Gakkai zasshi - 76(2020), 11 vom: 26., Seite 1118-1124

Sprache:

Japanisch

Beteiligte Personen:

Yagahara, Ayako [VerfasserIn]
Sato, Tetta [VerfasserIn]

Links:

Volltext

Themen:

Abbreviation
Full form
Journal Article
Terminology
Word2vec

Anmerkungen:

Date Completed 25.11.2020

Date Revised 25.11.2020

published: Print

Citation Status MEDLINE

doi:

10.6009/jjrt.2020_JSRT_76.11.1118

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM317967592