The systematic assessment of completeness of public metadata accompanying omics studies

Abstract Recent advances in high-throughput sequencing technologies have made it possible to collect and share a massive amount of omics data, along with its associated metadata. Enhancing metadata availability is critical to ensure data reusability and reproducibility and to facilitate novel biomedical discoveries through effective data reuse. Yet, incomplete metadata accompanying public omics data limits the reproducibility and reusability of millions of omics samples. In this study, we performed a comprehensive assessment of metadata completeness shared in both scientific publications and/or public repositories by analyzing over 253 studies encompassing over 164 thousands samples. We observed that studies often omit over a quarter of important phenotypes, with an average of only 74.8% of them shared either in the text of publication or the corresponding repository. Notably, public repositories alone contained 62% of the metadata, surpassing the textual content of publications by 3.5%. Only 11.5% of studies completely shared all phenotypes, while 37.9% shared less than 40% of the phenotypes. Studies involving non-human samples were more likely to share metadata than studies involving human samples. We observed similar results on the extended dataset spanning 2.1 million samples across over 61,000 studies from the Gene Expression Omnibus repository. The limited availability of metadata reported in our study emphasizes the necessity for improved metadata sharing practices and standardized reporting. Finally, we discuss the numerous benefits of improving the availability and quality of metadata to the scientific community abd beyond, supporting data-driven decision-making and policy development in the field of biomedical research..

Medienart:

Preprint

Erscheinungsjahr:

2023

Erschienen:

2023

Enthalten in:

bioRxiv.org - (2023) vom: 30. Dez. Zur Gesamtaufnahme - year:2023

Sprache:

Englisch

Beteiligte Personen:

Huang, Yu-Ning [VerfasserIn]
Jaiswal, Pooja Vinod [VerfasserIn]
Rajesh, Anushka [VerfasserIn]
Yadav, Anushka [VerfasserIn]
Yu, Dottie [VerfasserIn]
Liu, Fangyun [VerfasserIn]
Scheg, Grace [VerfasserIn]
Boldirev, Grigore [VerfasserIn]
Nakashidze, Irina [VerfasserIn]
Sarkar, Aditya [VerfasserIn]
Mehta, Jay Himanshu [VerfasserIn]
Wang, Ke [VerfasserIn]
Patel, Khooshbu Kantibhai [VerfasserIn]
Mirza, Mustafa Ali Baig [VerfasserIn]
Hapani, Kunali Chetan [VerfasserIn]
Peng, Qiushi [VerfasserIn]
Ayyala, Ram [VerfasserIn]
Guo, Ruiwei [VerfasserIn]
Kapur, Shaunak [VerfasserIn]
Ramesh, Tejasvene [VerfasserIn]
Abedalthagafi, Malak S. [VerfasserIn]
Mangul, Serghei [VerfasserIn]

Links:

Volltext [kostenfrei]

Themen:

570
Biology

doi:

10.1101/2021.11.22.469640

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

XBI033072876