Self-report inaccuracy in the UK Biobank: Impact on inference and interplay with selective participation

While the use of short self-report measures is common practice in biobank initiatives, such phenotyping strategy is inherently prone to reporting errors. In this work, we aimed to explore challenges related to self-report errors for biobank-scale research.We derived a reporting error score (RESUM) for n=73,129 UK Biobank (UKBB) participants, capturing inconsistent self-reporting in time-invariant phenotypes across multiple measurement occasions. We then performed genome-wide association scans on RESUM, applied downstream analyses (LD Score Regression and Mendelian Randomization, MR), and compared its properties to a previously studied participation behaviour (UKBB participation propensity). The results were then used in extended analyses (simulations, inverse probability and variance weighting) to explore patterns and propose possible corrections for biases induced by reporting error and/or selective participation. Finally, to assess the impact of reporting error on SNP effects and trait heritability, we improved phenotype resolution for 15 self-report measures and inspected the changes in genomic findings.Reporting error was present in the UKBB across all 33 assessed, time-invariant, measures, with repeatability levels as low as 11% (e.g., inconsistent recall of childhood sunburns). We found that reporting error was not independent from UKBB participation, evidenced by their negative genetic correlation (rg= -0.90), their shared causes (e.g., education, income, intelligence; assessed in MR) and the loss in self-report accuracy following participation bias correction. Depending on where reporting error occurred in the analytical pipeline, its impact ranged from reduced power (e.g., for gene-discovery) to biased effect estimates (e.g., if present in the exposure variable) and attenuation of genome-wide quantities (e.g., 20% relativeh2-attenuation for self-reported childhood height).Our findings highlight that both self-report accuracy and selective participation are competing biases and sources of poor reproducibility for biobank-scale research. Implementation of approaches that aim to enhance phenotype resolution while ensuring sample representativeness are therefore essential when working with biobank data..

Medienart:

Preprint

Erscheinungsjahr:

2023

Erschienen:

2023

Enthalten in:

bioRxiv.org - (2023) vom: 11. Okt. Zur Gesamtaufnahme - year:2023

Sprache:

Englisch

Beteiligte Personen:

Schoeler, Tabea [VerfasserIn]
Pingault, Jean-Baptiste [VerfasserIn]
Kutalik, Zoltán [VerfasserIn]

Links:

Volltext [kostenfrei]

Themen:

570
Biology

doi:

10.1101/2023.10.06.23296652

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

XBI041101960