Deep-learning-based segmentation of the vocal tract and articulators in real-time magnetic resonance images of speech

Copyright © 2020. Published by Elsevier B.V..

BACKGROUND AND OBJECTIVE: Magnetic resonance (MR) imaging is increasingly used in studies of speech as it enables non-invasive visualisation of the vocal tract and articulators, thus providing information about their shape, size, motion and position. Extraction of this information for quantitative analysis is achieved using segmentation. Methods have been developed to segment the vocal tract, however, none of these also fully segment any articulators. The objective of this work was to develop a method to fully segment multiple groups of articulators as well as the vocal tract in two-dimensional MR images of speech, thus overcoming the limitations of existing methods.

METHODS: Five speech MR image sets (392 MR images in total), each of a different healthy adult volunteer, were used in this work. A fully convolutional network with an architecture similar to the original U-Net was developed to segment the following six regions in the image sets: the head, soft palate, jaw, tongue, vocal tract and tooth space. A five-fold cross-validation was performed to investigate the segmentation accuracy and generalisability of the network. The segmentation accuracy was assessed using standard overlap-based metrics (Dice coefficient and general Hausdorff distance) and a novel clinically relevant metric based on velopharyngeal closure.

RESULTS: The segmentations created by the method had a median Dice coefficient of 0.92 and a median general Hausdorff distance of 5mm. The method segmented the head most accurately (median Dice coefficient of 0.99), and the soft palate and tooth space least accurately (median Dice coefficients of 0.92 and 0.93 respectively). The segmentations created by the method correctly showed 90% (27 out of 30) of the velopharyngeal closures in the MR image sets.

CONCLUSIONS: An automatic method to fully segment multiple groups of articulators as well as the vocal tract in two-dimensional MR images of speech was successfully developed. The method is intended for use in clinical and non-clinical speech studies which involve quantitative analysis of the shape, size, motion and position of the vocal tract and articulators. In addition, a novel clinically relevant metric for assessing the accuracy of vocal tract and articulator segmentation methods was developed.

Medienart:

E-Artikel

Erscheinungsjahr:

2021

Erschienen:

2021

Enthalten in:

Zur Gesamtaufnahme - volume:198

Enthalten in:

Computer methods and programs in biomedicine - 198(2021) vom: 18. Jan., Seite 105814

Sprache:

Englisch

Beteiligte Personen:

Ruthven, Matthieu [VerfasserIn]
Miquel, Marc E [VerfasserIn]
King, Andrew P [VerfasserIn]

Links:

Volltext

Themen:

Articulators
Convolutional neural networks
Dynamic magnetic resonance imaging
Journal Article
Segmentation
Speech
Vocal tract

Anmerkungen:

Date Completed 14.05.2021

Date Revised 31.03.2024

published: Print-Electronic

Citation Status MEDLINE

doi:

10.1016/j.cmpb.2020.105814

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM31765117X