Archetypal landscapes for deep neural networks
The predictive capabilities of deep neural networks (DNNs) continue to evolve to increasingly impressive levels. However, it is still unclear how training procedures for DNNs succeed in finding parameters that produce good results for such high-dimensional and nonconvex loss functions. In particular, we wish to understand why simple optimization schemes, such as stochastic gradient descent, do not end up trapped in local minima with high loss values that would not yield useful predictions. We explain the optimizability of DNNs by characterizing the local minima and transition states of the loss-function landscape (LFL) along with their connectivity. We show that the LFL of a DNN in the shallow network or data-abundant limit is funneled, and thus easy to optimize. Crucially, in the opposite low-data/deep limit, although the number of minima increases, the landscape is characterized by many minima with similar loss values separated by low barriers. This organization is different from the hierarchical landscapes of structural glass formers and explains why minimization procedures commonly employed by the machine-learning community can navigate the LFL successfully and reach low-lying solutions.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2020 |
---|---|
Erschienen: |
2020 |
Enthalten in: |
Zur Gesamtaufnahme - volume:117 |
---|---|
Enthalten in: |
Proceedings of the National Academy of Sciences of the United States of America - 117(2020), 36 vom: 08. Sept., Seite 21857-21864 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Verpoort, Philipp C [VerfasserIn] |
---|
Links: |
---|
Themen: |
Deep learning |
---|
Anmerkungen: |
Date Completed 15.10.2020 Date Revised 29.03.2024 published: Print-Electronic Citation Status PubMed-not-MEDLINE |
---|
doi: |
10.1073/pnas.1919995117 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM314167005 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | NLM314167005 | ||
003 | DE-627 | ||
005 | 20240329234540.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231225s2020 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1073/pnas.1919995117 |2 doi | |
028 | 5 | 2 | |a pubmed24n1354.xml |
035 | |a (DE-627)NLM314167005 | ||
035 | |a (NLM)32843349 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Verpoort, Philipp C |e verfasserin |4 aut | |
245 | 1 | 0 | |a Archetypal landscapes for deep neural networks |
264 | 1 | |c 2020 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 15.10.2020 | ||
500 | |a Date Revised 29.03.2024 | ||
500 | |a published: Print-Electronic | ||
500 | |a Citation Status PubMed-not-MEDLINE | ||
520 | |a The predictive capabilities of deep neural networks (DNNs) continue to evolve to increasingly impressive levels. However, it is still unclear how training procedures for DNNs succeed in finding parameters that produce good results for such high-dimensional and nonconvex loss functions. In particular, we wish to understand why simple optimization schemes, such as stochastic gradient descent, do not end up trapped in local minima with high loss values that would not yield useful predictions. We explain the optimizability of DNNs by characterizing the local minima and transition states of the loss-function landscape (LFL) along with their connectivity. We show that the LFL of a DNN in the shallow network or data-abundant limit is funneled, and thus easy to optimize. Crucially, in the opposite low-data/deep limit, although the number of minima increases, the landscape is characterized by many minima with similar loss values separated by low barriers. This organization is different from the hierarchical landscapes of structural glass formers and explains why minimization procedures commonly employed by the machine-learning community can navigate the LFL successfully and reach low-lying solutions | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Research Support, Non-U.S. Gov't | |
650 | 4 | |a deep learning | |
650 | 4 | |a energy landscapes | |
650 | 4 | |a neural networks | |
650 | 4 | |a optimization | |
650 | 4 | |a statistical mechanics | |
700 | 1 | |a Lee, Alpha A |e verfasserin |4 aut | |
700 | 1 | |a Wales, David J |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Proceedings of the National Academy of Sciences of the United States of America |d 1915 |g 117(2020), 36 vom: 08. Sept., Seite 21857-21864 |w (DE-627)NLM000008982 |x 1091-6490 |7 nnns |
773 | 1 | 8 | |g volume:117 |g year:2020 |g number:36 |g day:08 |g month:09 |g pages:21857-21864 |
856 | 4 | 0 | |u http://dx.doi.org/10.1073/pnas.1919995117 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 117 |j 2020 |e 36 |b 08 |c 09 |h 21857-21864 |