Analyses using multiple imputation need to consider missing data in auxiliary variables

Abstract Auxiliary variables are used in multiple imputation (MI) to reduce bias and increase efficiency. These variables may often themselves be incomplete. We explored how missing data in auxiliary variables influenced estimates obtained from MI. We implemented a simulation study with three different missing data mechanisms for the outcome. We then examined the impact of increasing proportions of missing data and different missingness mechanisms for the auxiliary variable on bias of an unadjusted linear regression coefficient and the fraction of missing information. We illustrate our findings with an applied example in the Avon Longitudinal Study of Parents and Children. We found that where complete records analyses were biased, increasing proportions of missing data in auxiliary variables, under any missing data mechanism, reduced the ability of MI including the auxiliary variable to mitigate this bias. Where there was no bias in the complete records analysis, inclusion of a missing not at random auxiliary variable in MI introduced bias of potentially important magnitude (up to 17% of the effect size in our simulation). Careful consideration of the quantity and nature of missing data in auxiliary variables needs to be made when selecting them for use in MI models..

Medienart:

Preprint

Erscheinungsjahr:

2023

Erschienen:

2023

Enthalten in:

bioRxiv.org - (2023) vom: 14. Dez. Zur Gesamtaufnahme - year:2023

Sprache:

Englisch

Beteiligte Personen:

Madley-Dowd, Paul [VerfasserIn]
Curnow, Elinor [VerfasserIn]
Hughes, Rachael A. [VerfasserIn]
Cornish, Rosie [VerfasserIn]
Tilling, Kate [VerfasserIn]
Heron, Jon [VerfasserIn]

Links:

Volltext [kostenfrei]

Themen:

570
Biology

doi:

10.1101/2023.12.11.23299810

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

XBI041848993