Details der Publikation - Archetypal landscapes for deep neural networks

Archetypal landscapes for deep neural networks

The predictive capabilities of deep neural networks (DNNs) continue to evolve to increasingly impressive levels. However, it is still unclear how training procedures for DNNs succeed in finding parameters that produce good results for such high-dimensional and nonconvex loss functions. In particular, we wish to understand why simple optimization schemes, such as stochastic gradient descent, do not end up trapped in local minima with high loss values that would not yield useful predictions. We explain the optimizability of DNNs by characterizing the local minima and transition states of the loss-function landscape (LFL) along with their connectivity. We show that the LFL of a DNN in the shallow network or data-abundant limit is funneled, and thus easy to optimize. Crucially, in the opposite low-data/deep limit, although the number of minima increases, the landscape is characterized by many minima with similar loss values separated by low barriers. This organization is different from the hierarchical landscapes of structural glass formers and explains why minimization procedures commonly employed by the machine-learning community can navigate the LFL successfully and reach low-lying solutions.

Medienart:	E-Artikel

Erscheinungsjahr:	2020
Erschienen:	2020

Enthalten in:	Zur Gesamtaufnahme - volume:117
Enthalten in:	Proceedings of the National Academy of Sciences of the United States of America - 117(2020), 36 vom: 08. Sept., Seite 21857-21864

Sprache:	Englisch

Beteiligte Personen:	Verpoort, Philipp C [VerfasserIn] Lee, Alpha A [VerfasserIn] Wales, David J [VerfasserIn]

Links:	Volltext

Themen:	Deep learning Energy landscapes Journal Article Neural networks Optimization Research Support, Non-U.S. Gov't Statistical mechanics

Anmerkungen:	Date Completed 15.10.2020 Date Revised 29.03.2024 published: Print-Electronic Citation Status PubMed-not-MEDLINE

doi:	10.1073/pnas.1919995117

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	NLM314167005

Internformat


LEADER	01000caa a22002652 4500
001	NLM314167005
003	DE-627
005	20240329234540.0
007	cr uuu---uuuuu
008	231225s2020 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1073/pnas.1919995117 \|2 doi
028	5	2	\|a pubmed24n1354.xml
035			\|a (DE-627)NLM314167005
035			\|a (NLM)32843349
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Verpoort, Philipp C \|e verfasserin \|4 aut
245	1	0	\|a Archetypal landscapes for deep neural networks
264		1	\|c 2020
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 15.10.2020
500			\|a Date Revised 29.03.2024
500			\|a published: Print-Electronic
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a The predictive capabilities of deep neural networks (DNNs) continue to evolve to increasingly impressive levels. However, it is still unclear how training procedures for DNNs succeed in finding parameters that produce good results for such high-dimensional and nonconvex loss functions. In particular, we wish to understand why simple optimization schemes, such as stochastic gradient descent, do not end up trapped in local minima with high loss values that would not yield useful predictions. We explain the optimizability of DNNs by characterizing the local minima and transition states of the loss-function landscape (LFL) along with their connectivity. We show that the LFL of a DNN in the shallow network or data-abundant limit is funneled, and thus easy to optimize. Crucially, in the opposite low-data/deep limit, although the number of minima increases, the landscape is characterized by many minima with similar loss values separated by low barriers. This organization is different from the hierarchical landscapes of structural glass formers and explains why minimization procedures commonly employed by the machine-learning community can navigate the LFL successfully and reach low-lying solutions
650		4	\|a Journal Article
650		4	\|a Research Support, Non-U.S. Gov't
650		4	\|a deep learning
650		4	\|a energy landscapes
650		4	\|a neural networks
650		4	\|a optimization
650		4	\|a statistical mechanics
700	1		\|a Lee, Alpha A \|e verfasserin \|4 aut
700	1		\|a Wales, David J \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t Proceedings of the National Academy of Sciences of the United States of America \|d 1915 \|g 117(2020), 36 vom: 08. Sept., Seite 21857-21864 \|w (DE-627)NLM000008982 \|x 1091-6490 \|7 nnns
773	1	8	\|g volume:117 \|g year:2020 \|g number:36 \|g day:08 \|g month:09 \|g pages:21857-21864
856	4	0	\|u http://dx.doi.org/10.1073/pnas.1919995117 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a GBV_NLM
951			\|a AR
952			\|d 117 \|j 2020 \|e 36 \|b 08 \|c 09 \|h 21857-21864

Archetypal landscapes for deep neural networks

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände