Details der Publikation - Identifying bias in models that detect vocal fold paralysis from audio recordings using explainable machine learning and clinician ratings

Identifying bias in models that detect vocal fold paralysis from audio recordings using explainable machine learning and clinician ratings

Abstract Introduction Detecting voice disorders from voice recordings could allow for frequent, remote, and low-cost screening before costly clinical visits and a more invasive laryngoscopy examination. Our goals were to detect unilateral vocal fold paralysis (UVFP) from voice recordings using machine learning, to identify which acoustic variables were important for prediction to increase trust, and to determine model performance relative to clinician performance.Methods Patients with confirmed UVFP through endoscopic examination (N=77) and controls with normal voices matched for age and sex (N=77) were included. Voice samples were elicited by reading the Rainbow Passage and sustaining phonation of the vowel “a”. Four machine learning models of differing complexity were used. SHapley Additive exPlanations (SHAP) was used to identify important features.Results The highest median bootstrapped ROC AUC score was 0.87 and beat clinician’s performance (range: 0.74 – 0.81) based on the recordings. Recording durations were different between UVFP recordings and controls due to how that data was originally processed when storing, which we can show can classify both groups. And counterintuitively, many UVFP recordings had higher intensity than controls, when UVFP patients tend to have weaker voices, revealing a dataset-specific bias which we mitigate in an additional analysis.Conclusion We demonstrate that recording biases in audio duration and intensity created dataset-specific differences between patients and controls, which models used to improve classification. Furthermore, clinician’s ratings provide further evidence that patients were over-projecting their voices and being recorded at a higher amplitude signal than controls. Interestingly, after matching audio duration and removing variables associated with intensity in order to mitigate the biases, the models were able to achieve a similar high performance. We provide a set of recommendations to avoid bias when building and evaluating machine learning models for screening in laryngology..

Medienart:	Preprint

Erscheinungsjahr:	2024
Erschienen:	2024

Enthalten in:	bioRxiv.org - (2024) vom: 22. März Zur Gesamtaufnahme - year:2024

Sprache:	Englisch

Beteiligte Personen:	Low, Daniel M. [VerfasserIn] Rao, Vishwanatha [VerfasserIn] Randolph, Gregory [VerfasserIn] Song, Phillip C. [VerfasserIn] Ghosh, Satrajit S. [VerfasserIn]

Links:	Volltext [kostenfrei]

Themen:	570 Biology

doi:	10.1101/2020.11.23.20235945

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	XBI019409958

Internformat


LEADER	01000caa a22002652 4500
001	XBI019409958
003	DE-627
005	20240323090729.0
007	cr uuu---uuuuu
008	201126s2024 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1101/2020.11.23.20235945 \|2 doi
035			\|a (DE-627)XBI019409958
035			\|a (biorXiv)10.1101/2020.11.23.20235945
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Low, Daniel M. \|e verfasserin \|0 (orcid)0000-0002-8866-8667 \|4 aut
245	1	0	\|a Identifying bias in models that detect vocal fold paralysis from audio recordings using explainable machine learning and clinician ratings
264		1	\|c 2024
336			\|a Text \|b txt \|2 rdacontent
337			\|a Computermedien \|b c \|2 rdamedia
338			\|a Online-Ressource \|b cr \|2 rdacarrier
520			\|a Abstract Introduction Detecting voice disorders from voice recordings could allow for frequent, remote, and low-cost screening before costly clinical visits and a more invasive laryngoscopy examination. Our goals were to detect unilateral vocal fold paralysis (UVFP) from voice recordings using machine learning, to identify which acoustic variables were important for prediction to increase trust, and to determine model performance relative to clinician performance.Methods Patients with confirmed UVFP through endoscopic examination (N=77) and controls with normal voices matched for age and sex (N=77) were included. Voice samples were elicited by reading the Rainbow Passage and sustaining phonation of the vowel “a”. Four machine learning models of differing complexity were used. SHapley Additive exPlanations (SHAP) was used to identify important features.Results The highest median bootstrapped ROC AUC score was 0.87 and beat clinician’s performance (range: 0.74 – 0.81) based on the recordings. Recording durations were different between UVFP recordings and controls due to how that data was originally processed when storing, which we can show can classify both groups. And counterintuitively, many UVFP recordings had higher intensity than controls, when UVFP patients tend to have weaker voices, revealing a dataset-specific bias which we mitigate in an additional analysis.Conclusion We demonstrate that recording biases in audio duration and intensity created dataset-specific differences between patients and controls, which models used to improve classification. Furthermore, clinician’s ratings provide further evidence that patients were over-projecting their voices and being recorded at a higher amplitude signal than controls. Interestingly, after matching audio duration and removing variables associated with intensity in order to mitigate the biases, the models were able to achieve a similar high performance. We provide a set of recommendations to avoid bias when building and evaluating machine learning models for screening in laryngology.
650		4	\|a Biology \|7 (dpeaa)DE-84
650		4	\|a 570 \|7 (dpeaa)DE-84
700	1		\|a Rao, Vishwanatha \|4 aut
700	1		\|a Randolph, Gregory \|0 (orcid)0000-0001-5373-9181 \|4 aut
700	1		\|a Song, Phillip C. \|0 (orcid)0000-0003-0206-5441 \|4 aut
700	1		\|a Ghosh, Satrajit S. \|0 (orcid)0000-0002-5312-6729 \|4 aut
773	0	8	\|i Enthalten in \|t bioRxiv.org \|g (2024) vom: 22. März
773	1	8	\|g year:2024 \|g day:22 \|g month:03
856	4	0	\|u http://dx.doi.org/10.1101/2020.11.23.20235945 \|z kostenfrei \|3 Volltext
912			\|a GBV_XBI
951			\|a AR
952			\|j 2024 \|b 22 \|c 03

Identifying bias in models that detect vocal fold paralysis from audio recordings using explainable machine learning and clinician ratings

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände