Learning the Information Divergence
Information divergence that measures the difference between two nonnegative matrices or tensors has found its use in a variety of machine learning problems. Examples are Nonnegative Matrix/Tensor Factorization, Stochastic Neighbor Embedding, topic models, and Bayesian network optimization. The success of such a learning task depends heavily on a suitable divergence. A large variety of divergences have been suggested and analyzed, but very few results are available for an objective choice of the optimal divergence for a given task. Here we present a framework that facilitates automatic selection of the best divergence among a given family, based on standard maximum likelihood estimation. We first propose an approximated Tweedie distribution for the β-divergence family. Selecting the best β then becomes a machine learning problem solved by maximum likelihood. Next, we reformulate α-divergence in terms of β-divergence, which enables automatic selection of α by maximum likelihood with reuse of the learning principle for β-divergence. Furthermore, we show the connections between γ- and β-divergences as well as Renyi- and α-divergences, such that our automatic selection framework is extended to non-separable divergences. Experiments on both synthetic and real-world data demonstrate that our method can quite accurately select information divergence across different learning problems and various divergence families.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2015 |
---|---|
Erschienen: |
2015 |
Enthalten in: |
Zur Gesamtaufnahme - volume:37 |
---|---|
Enthalten in: |
IEEE transactions on pattern analysis and machine intelligence - 37(2015), 7 vom: 14. Juli, Seite 1442-54 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Dikmen, Onur [VerfasserIn] |
---|
Links: |
---|
Themen: |
---|
Anmerkungen: |
Date Completed 24.11.2015 Date Revised 10.09.2015 published: Print Citation Status PubMed-not-MEDLINE |
---|
doi: |
10.1109/TPAMI.2014.2366144 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM252583493 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM252583493 | ||
003 | DE-627 | ||
005 | 20231224164423.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231224s2015 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1109/TPAMI.2014.2366144 |2 doi | |
028 | 5 | 2 | |a pubmed24n0842.xml |
035 | |a (DE-627)NLM252583493 | ||
035 | |a (NLM)26352451 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Dikmen, Onur |e verfasserin |4 aut | |
245 | 1 | 0 | |a Learning the Information Divergence |
264 | 1 | |c 2015 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 24.11.2015 | ||
500 | |a Date Revised 10.09.2015 | ||
500 | |a published: Print | ||
500 | |a Citation Status PubMed-not-MEDLINE | ||
520 | |a Information divergence that measures the difference between two nonnegative matrices or tensors has found its use in a variety of machine learning problems. Examples are Nonnegative Matrix/Tensor Factorization, Stochastic Neighbor Embedding, topic models, and Bayesian network optimization. The success of such a learning task depends heavily on a suitable divergence. A large variety of divergences have been suggested and analyzed, but very few results are available for an objective choice of the optimal divergence for a given task. Here we present a framework that facilitates automatic selection of the best divergence among a given family, based on standard maximum likelihood estimation. We first propose an approximated Tweedie distribution for the β-divergence family. Selecting the best β then becomes a machine learning problem solved by maximum likelihood. Next, we reformulate α-divergence in terms of β-divergence, which enables automatic selection of α by maximum likelihood with reuse of the learning principle for β-divergence. Furthermore, we show the connections between γ- and β-divergences as well as Renyi- and α-divergences, such that our automatic selection framework is extended to non-separable divergences. Experiments on both synthetic and real-world data demonstrate that our method can quite accurately select information divergence across different learning problems and various divergence families | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Research Support, Non-U.S. Gov't | |
700 | 1 | |a Yang, Zhirong |e verfasserin |4 aut | |
700 | 1 | |a Oja, Erkki |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t IEEE transactions on pattern analysis and machine intelligence |d 1979 |g 37(2015), 7 vom: 14. Juli, Seite 1442-54 |w (DE-627)NLM098212257 |x 1939-3539 |7 nnns |
773 | 1 | 8 | |g volume:37 |g year:2015 |g number:7 |g day:14 |g month:07 |g pages:1442-54 |
856 | 4 | 0 | |u http://dx.doi.org/10.1109/TPAMI.2014.2366144 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 37 |j 2015 |e 7 |b 14 |c 07 |h 1442-54 |