Details der Publikation - Imputation of missing values for cochlear implant candidate audiometric data and potential applications

Imputation of missing values for cochlear implant candidate audiometric data and potential applications

Copyright: © 2023 Pavelchek et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited..

OBJECTIVE: Assess the real-world performance of popular imputation algorithms on cochlear implant (CI) candidate audiometric data.

METHODS: 7,451 audiograms from patients undergoing CI candidacy evaluation were pooled from 32 institutions with complete case analysis yielding 1,304 audiograms. Imputation model performance was assessed with nested cross-validation on randomly generated sparse datasets with various amounts of missing data, distributions of sparsity, and dataset sizes. A threshold for safe imputation was defined as root mean square error (RMSE) <10dB. Models included univariate imputation, interpolation, multiple imputation by chained equations (MICE), k-nearest neighbors, gradient boosted trees, and neural networks.

RESULTS: Greater quantities of missing data were associated with worse performance. Sparsity in audiometric data is not uniformly distributed, as inter-octave frequencies are less commonly tested. With 3-8 missing features per instance, a real-world sparsity distribution was associated with significantly better performance compared to other sparsity distributions (Δ RMSE 0.3 dB- 5.8 dB, non-overlapping 99% confidence intervals). With a real-world sparsity distribution, models were able to safely impute up to 6 missing datapoints in an 11-frequency audiogram. MICE consistently outperformed other models across all metrics and sparsity distributions (p < 0.01, Wilcoxon rank sum test). With sparsity capped at 6 missing features per audiogram but otherwise equivalent to the raw dataset, MICE imputed with RMSE of 7.83 dB [95% CI 7.81-7.86]. Imputing up to 6 missing features captures 99.3% of the audiograms in our dataset, allowing for a 5.7-fold increase in dataset size (1,304 to 7,399 audiograms) as compared with complete case analysis.

CONCLUSION: Precision medicine will inevitably play an integral role in the future of hearing healthcare. These methods are data dependent, and rigorously validated imputation models are a key tool for maximizing datasets. Using the largest CI audiogram dataset to-date, we demonstrate that in a real-world scenario MICE can safely impute missing data for the vast majority (>99%) of audiograms with RMSE well below a clinically significant threshold of 10dB. Evaluation across a range of dataset sizes and sparsity distributions suggests a high degree of generalizability to future applications.

Medienart:	E-Artikel

Erscheinungsjahr:	2023
Erschienen:	2023

Enthalten in:	Zur Gesamtaufnahme - volume:18
Enthalten in:	PloS one - 18(2023), 2 vom: 28., Seite e0281337

Sprache:	Englisch

Beteiligte Personen:	Pavelchek, Cole [VerfasserIn] Michelson, Andrew P [VerfasserIn] Walia, Amit [VerfasserIn] Ortmann, Amanda [VerfasserIn] Herzog, Jacques [VerfasserIn] Buchman, Craig A [VerfasserIn] Shew, Matthew A [VerfasserIn]

Links:	Volltext

Themen:	Journal Article

Anmerkungen:	Date Completed 08.02.2023 Date Revised 06.04.2023 published: Electronic-eCollection Citation Status MEDLINE

doi:	10.1371/journal.pone.0281337

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	NLM352556463

Internformat


LEADER	01000naa a22002652 4500
001	NLM352556463
003	DE-627
005	20231226054010.0
007	cr uuu---uuuuu
008	231226s2023 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1371/journal.pone.0281337 \|2 doi
028	5	2	\|a pubmed24n1175.xml
035			\|a (DE-627)NLM352556463
035			\|a (NLM)36745652
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Pavelchek, Cole \|e verfasserin \|4 aut
245	1	0	\|a Imputation of missing values for cochlear implant candidate audiometric data and potential applications
264		1	\|c 2023
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 08.02.2023
500			\|a Date Revised 06.04.2023
500			\|a published: Electronic-eCollection
500			\|a Citation Status MEDLINE
520			\|a Copyright: © 2023 Pavelchek et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
520			\|a OBJECTIVE: Assess the real-world performance of popular imputation algorithms on cochlear implant (CI) candidate audiometric data
520			\|a METHODS: 7,451 audiograms from patients undergoing CI candidacy evaluation were pooled from 32 institutions with complete case analysis yielding 1,304 audiograms. Imputation model performance was assessed with nested cross-validation on randomly generated sparse datasets with various amounts of missing data, distributions of sparsity, and dataset sizes. A threshold for safe imputation was defined as root mean square error (RMSE) <10dB. Models included univariate imputation, interpolation, multiple imputation by chained equations (MICE), k-nearest neighbors, gradient boosted trees, and neural networks
520			\|a RESULTS: Greater quantities of missing data were associated with worse performance. Sparsity in audiometric data is not uniformly distributed, as inter-octave frequencies are less commonly tested. With 3-8 missing features per instance, a real-world sparsity distribution was associated with significantly better performance compared to other sparsity distributions (Δ RMSE 0.3 dB- 5.8 dB, non-overlapping 99% confidence intervals). With a real-world sparsity distribution, models were able to safely impute up to 6 missing datapoints in an 11-frequency audiogram. MICE consistently outperformed other models across all metrics and sparsity distributions (p < 0.01, Wilcoxon rank sum test). With sparsity capped at 6 missing features per audiogram but otherwise equivalent to the raw dataset, MICE imputed with RMSE of 7.83 dB [95% CI 7.81-7.86]. Imputing up to 6 missing features captures 99.3% of the audiograms in our dataset, allowing for a 5.7-fold increase in dataset size (1,304 to 7,399 audiograms) as compared with complete case analysis
520			\|a CONCLUSION: Precision medicine will inevitably play an integral role in the future of hearing healthcare. These methods are data dependent, and rigorously validated imputation models are a key tool for maximizing datasets. Using the largest CI audiogram dataset to-date, we demonstrate that in a real-world scenario MICE can safely impute missing data for the vast majority (>99%) of audiograms with RMSE well below a clinically significant threshold of 10dB. Evaluation across a range of dataset sizes and sparsity distributions suggests a high degree of generalizability to future applications
650		4	\|a Journal Article
700	1		\|a Michelson, Andrew P \|e verfasserin \|4 aut
700	1		\|a Walia, Amit \|e verfasserin \|4 aut
700	1		\|a Ortmann, Amanda \|e verfasserin \|4 aut
700	1		\|a Herzog, Jacques \|e verfasserin \|4 aut
700	1		\|a Buchman, Craig A \|e verfasserin \|4 aut
700	1		\|a Shew, Matthew A \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t PloS one \|d 2006 \|g 18(2023), 2 vom: 28., Seite e0281337 \|w (DE-627)NLM167327399 \|x 1932-6203 \|7 nnns
773	1	8	\|g volume:18 \|g year:2023 \|g number:2 \|g day:28 \|g pages:e0281337
856	4	0	\|u http://dx.doi.org/10.1371/journal.pone.0281337 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a GBV_NLM
951			\|a AR
952			\|d 18 \|j 2023 \|e 2 \|b 28 \|h e0281337

Imputation of missing values for cochlear implant candidate audiometric data and potential applications

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände