A self-supervised deep learning method for data-efficient training in genomics

© 2023. Springer Nature Limited..

Deep learning in bioinformatics is often limited to problems where extensive amounts of labeled data are available for supervised classification. By exploiting unlabeled data, self-supervised learning techniques can improve the performance of machine learning models in the presence of limited labeled data. Although many self-supervised learning methods have been suggested before, they have failed to exploit the unique characteristics of genomic data. Therefore, we introduce Self-GenomeNet, a self-supervised learning technique that is custom-tailored for genomic data. Self-GenomeNet leverages reverse-complement sequences and effectively learns short- and long-term dependencies by predicting targets of different lengths. Self-GenomeNet performs better than other self-supervised methods in data-scarce genomic tasks and outperforms standard supervised training with ~10 times fewer labeled training data. Furthermore, the learned representations generalize well to new datasets and tasks. These findings suggest that Self-GenomeNet is well suited for large-scale, unlabeled genomic datasets and could substantially improve the performance of genomic models.

Medienart:

E-Artikel

Erscheinungsjahr:

2023

Erschienen:

2023

Enthalten in:

Zur Gesamtaufnahme - volume:6

Enthalten in:

Communications biology - 6(2023), 1 vom: 11. Sept., Seite 928

Sprache:

Englisch

Beteiligte Personen:

Gündüz, Hüseyin Anil [VerfasserIn]
Binder, Martin [VerfasserIn]
To, Xiao-Yin [VerfasserIn]
Mreches, René [VerfasserIn]
Bischl, Bernd [VerfasserIn]
McHardy, Alice C [VerfasserIn]
Münch, Philipp C [VerfasserIn]
Rezaei, Mina [VerfasserIn]

Links:

Volltext

Themen:

Journal Article
Research Support, Non-U.S. Gov't

Anmerkungen:

Date Completed 13.09.2023

Date Revised 19.11.2023

published: Electronic

Citation Status MEDLINE

doi:

10.1038/s42003-023-05310-2

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM361943652