Details der Publikation - A self-supervised deep learning method for data-efficient training in genomics

A self-supervised deep learning method for data-efficient training in genomics

© 2023. Springer Nature Limited..

Deep learning in bioinformatics is often limited to problems where extensive amounts of labeled data are available for supervised classification. By exploiting unlabeled data, self-supervised learning techniques can improve the performance of machine learning models in the presence of limited labeled data. Although many self-supervised learning methods have been suggested before, they have failed to exploit the unique characteristics of genomic data. Therefore, we introduce Self-GenomeNet, a self-supervised learning technique that is custom-tailored for genomic data. Self-GenomeNet leverages reverse-complement sequences and effectively learns short- and long-term dependencies by predicting targets of different lengths. Self-GenomeNet performs better than other self-supervised methods in data-scarce genomic tasks and outperforms standard supervised training with ~10 times fewer labeled training data. Furthermore, the learned representations generalize well to new datasets and tasks. These findings suggest that Self-GenomeNet is well suited for large-scale, unlabeled genomic datasets and could substantially improve the performance of genomic models.

Medienart:	E-Artikel

Erscheinungsjahr:	2023
Erschienen:	2023

Enthalten in:	Zur Gesamtaufnahme - volume:6
Enthalten in:	Communications biology - 6(2023), 1 vom: 11. Sept., Seite 928

Sprache:	Englisch

Beteiligte Personen:	Gündüz, Hüseyin Anil [VerfasserIn] Binder, Martin [VerfasserIn] To, Xiao-Yin [VerfasserIn] Mreches, René [VerfasserIn] Bischl, Bernd [VerfasserIn] McHardy, Alice C [VerfasserIn] Münch, Philipp C [VerfasserIn] Rezaei, Mina [VerfasserIn]

Links:	Volltext

Themen:	Journal Article Research Support, Non-U.S. Gov't

Anmerkungen:	Date Completed 13.09.2023 Date Revised 19.11.2023 published: Electronic Citation Status MEDLINE

doi:	10.1038/s42003-023-05310-2

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	NLM361943652

Internformat


LEADER	01000naa a22002652 4500
001	NLM361943652
003	DE-627
005	20231226090218.0
007	cr uuu---uuuuu
008	231226s2023 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1038/s42003-023-05310-2 \|2 doi
028	5	2	\|a pubmed24n1206.xml
035			\|a (DE-627)NLM361943652
035			\|a (NLM)37696966
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Gündüz, Hüseyin Anil \|e verfasserin \|4 aut
245	1	2	\|a A self-supervised deep learning method for data-efficient training in genomics
264		1	\|c 2023
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 13.09.2023
500			\|a Date Revised 19.11.2023
500			\|a published: Electronic
500			\|a Citation Status MEDLINE
520			\|a © 2023. Springer Nature Limited.
520			\|a Deep learning in bioinformatics is often limited to problems where extensive amounts of labeled data are available for supervised classification. By exploiting unlabeled data, self-supervised learning techniques can improve the performance of machine learning models in the presence of limited labeled data. Although many self-supervised learning methods have been suggested before, they have failed to exploit the unique characteristics of genomic data. Therefore, we introduce Self-GenomeNet, a self-supervised learning technique that is custom-tailored for genomic data. Self-GenomeNet leverages reverse-complement sequences and effectively learns short- and long-term dependencies by predicting targets of different lengths. Self-GenomeNet performs better than other self-supervised methods in data-scarce genomic tasks and outperforms standard supervised training with ~10 times fewer labeled training data. Furthermore, the learned representations generalize well to new datasets and tasks. These findings suggest that Self-GenomeNet is well suited for large-scale, unlabeled genomic datasets and could substantially improve the performance of genomic models
650		4	\|a Journal Article
650		4	\|a Research Support, Non-U.S. Gov't
700	1		\|a Binder, Martin \|e verfasserin \|4 aut
700	1		\|a To, Xiao-Yin \|e verfasserin \|4 aut
700	1		\|a Mreches, René \|e verfasserin \|4 aut
700	1		\|a Bischl, Bernd \|e verfasserin \|4 aut
700	1		\|a McHardy, Alice C \|e verfasserin \|4 aut
700	1		\|a Münch, Philipp C \|e verfasserin \|4 aut
700	1		\|a Rezaei, Mina \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t Communications biology \|d 2018 \|g 6(2023), 1 vom: 11. Sept., Seite 928 \|w (DE-627)NLM284287245 \|x 2399-3642 \|7 nnns
773	1	8	\|g volume:6 \|g year:2023 \|g number:1 \|g day:11 \|g month:09 \|g pages:928
856	4	0	\|u http://dx.doi.org/10.1038/s42003-023-05310-2 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a GBV_NLM
951			\|a AR
952			\|d 6 \|j 2023 \|e 1 \|b 11 \|c 09 \|h 928

A self-supervised deep learning method for data-efficient training in genomics

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände