A self-supervised deep learning method for data-efficient training in genomics
© 2023. Springer Nature Limited..
Deep learning in bioinformatics is often limited to problems where extensive amounts of labeled data are available for supervised classification. By exploiting unlabeled data, self-supervised learning techniques can improve the performance of machine learning models in the presence of limited labeled data. Although many self-supervised learning methods have been suggested before, they have failed to exploit the unique characteristics of genomic data. Therefore, we introduce Self-GenomeNet, a self-supervised learning technique that is custom-tailored for genomic data. Self-GenomeNet leverages reverse-complement sequences and effectively learns short- and long-term dependencies by predicting targets of different lengths. Self-GenomeNet performs better than other self-supervised methods in data-scarce genomic tasks and outperforms standard supervised training with ~10 times fewer labeled training data. Furthermore, the learned representations generalize well to new datasets and tasks. These findings suggest that Self-GenomeNet is well suited for large-scale, unlabeled genomic datasets and could substantially improve the performance of genomic models.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2023 |
---|---|
Erschienen: |
2023 |
Enthalten in: |
Zur Gesamtaufnahme - volume:6 |
---|---|
Enthalten in: |
Communications biology - 6(2023), 1 vom: 11. Sept., Seite 928 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Gündüz, Hüseyin Anil [VerfasserIn] |
---|
Links: |
---|
Themen: |
---|
Anmerkungen: |
Date Completed 13.09.2023 Date Revised 19.11.2023 published: Electronic Citation Status MEDLINE |
---|
doi: |
10.1038/s42003-023-05310-2 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM361943652 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM361943652 | ||
003 | DE-627 | ||
005 | 20231226090218.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231226s2023 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1038/s42003-023-05310-2 |2 doi | |
028 | 5 | 2 | |a pubmed24n1206.xml |
035 | |a (DE-627)NLM361943652 | ||
035 | |a (NLM)37696966 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Gündüz, Hüseyin Anil |e verfasserin |4 aut | |
245 | 1 | 2 | |a A self-supervised deep learning method for data-efficient training in genomics |
264 | 1 | |c 2023 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 13.09.2023 | ||
500 | |a Date Revised 19.11.2023 | ||
500 | |a published: Electronic | ||
500 | |a Citation Status MEDLINE | ||
520 | |a © 2023. Springer Nature Limited. | ||
520 | |a Deep learning in bioinformatics is often limited to problems where extensive amounts of labeled data are available for supervised classification. By exploiting unlabeled data, self-supervised learning techniques can improve the performance of machine learning models in the presence of limited labeled data. Although many self-supervised learning methods have been suggested before, they have failed to exploit the unique characteristics of genomic data. Therefore, we introduce Self-GenomeNet, a self-supervised learning technique that is custom-tailored for genomic data. Self-GenomeNet leverages reverse-complement sequences and effectively learns short- and long-term dependencies by predicting targets of different lengths. Self-GenomeNet performs better than other self-supervised methods in data-scarce genomic tasks and outperforms standard supervised training with ~10 times fewer labeled training data. Furthermore, the learned representations generalize well to new datasets and tasks. These findings suggest that Self-GenomeNet is well suited for large-scale, unlabeled genomic datasets and could substantially improve the performance of genomic models | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Research Support, Non-U.S. Gov't | |
700 | 1 | |a Binder, Martin |e verfasserin |4 aut | |
700 | 1 | |a To, Xiao-Yin |e verfasserin |4 aut | |
700 | 1 | |a Mreches, René |e verfasserin |4 aut | |
700 | 1 | |a Bischl, Bernd |e verfasserin |4 aut | |
700 | 1 | |a McHardy, Alice C |e verfasserin |4 aut | |
700 | 1 | |a Münch, Philipp C |e verfasserin |4 aut | |
700 | 1 | |a Rezaei, Mina |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Communications biology |d 2018 |g 6(2023), 1 vom: 11. Sept., Seite 928 |w (DE-627)NLM284287245 |x 2399-3642 |7 nnns |
773 | 1 | 8 | |g volume:6 |g year:2023 |g number:1 |g day:11 |g month:09 |g pages:928 |
856 | 4 | 0 | |u http://dx.doi.org/10.1038/s42003-023-05310-2 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 6 |j 2023 |e 1 |b 11 |c 09 |h 928 |