High dimensional model representation of log-likelihood ratio : binary classification with expression data
BACKGROUND: Binary classification rules based on a small-sample of high-dimensional data (for instance, gene expression data) are ubiquitous in modern bioinformatics. Constructing such classifiers is challenging due to (a) the complex nature of underlying biological traits, such as gene interactions, and (b) the need for highly interpretable glass-box models. We use the theory of high dimensional model representation (HDMR) to build interpretable low dimensional approximations of the log-likelihood ratio accounting for the effects of each individual gene as well as gene-gene interactions. We propose two algorithms approximating the second order HDMR expansion, and a hypothesis test based on the HDMR formulation to identify significantly dysregulated pairwise interactions. The theory is seen as flexible and requiring only a mild set of assumptions.
RESULTS: We apply our approach to gene expression data from both synthetic and real (breast and lung cancer) datasets comparing it also against several popular state-of-the-art methods. The analyses suggest the proposed algorithms can be used to obtain interpretable prediction rules with high prediction accuracies and to successfully extract significantly dysregulated gene-gene interactions from the data. They also compare favorably against their competitors across multiple synthetic data scenarios.
CONCLUSION: The proposed HDMR-based approach appears to produce a reliable classifier that additionally allows one to describe how individual genes or gene-gene interactions affect classification decisions. Both real and synthetic data analyses suggest that our methods can be used to identify gene networks with dysregulated pairwise interactions, and are therefore appropriate for differential networks analysis.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2020 |
---|---|
Erschienen: |
2020 |
Enthalten in: |
Zur Gesamtaufnahme - volume:21 |
---|---|
Enthalten in: |
BMC bioinformatics - 21(2020), 1 vom: 25. Apr., Seite 156 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Foroughi Pour, Ali [VerfasserIn] |
---|
Links: |
---|
Themen: |
Classification |
---|
Anmerkungen: |
Date Completed 16.06.2020 Date Revised 16.06.2020 published: Electronic Citation Status MEDLINE |
---|
doi: |
10.1186/s12859-020-3486-x |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM309186153 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM309186153 | ||
003 | DE-627 | ||
005 | 20231225133344.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231225s2020 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1186/s12859-020-3486-x |2 doi | |
028 | 5 | 2 | |a pubmed24n1030.xml |
035 | |a (DE-627)NLM309186153 | ||
035 | |a (NLM)32334509 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Foroughi Pour, Ali |e verfasserin |4 aut | |
245 | 1 | 0 | |a High dimensional model representation of log-likelihood ratio |b binary classification with expression data |
264 | 1 | |c 2020 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 16.06.2020 | ||
500 | |a Date Revised 16.06.2020 | ||
500 | |a published: Electronic | ||
500 | |a Citation Status MEDLINE | ||
520 | |a BACKGROUND: Binary classification rules based on a small-sample of high-dimensional data (for instance, gene expression data) are ubiquitous in modern bioinformatics. Constructing such classifiers is challenging due to (a) the complex nature of underlying biological traits, such as gene interactions, and (b) the need for highly interpretable glass-box models. We use the theory of high dimensional model representation (HDMR) to build interpretable low dimensional approximations of the log-likelihood ratio accounting for the effects of each individual gene as well as gene-gene interactions. We propose two algorithms approximating the second order HDMR expansion, and a hypothesis test based on the HDMR formulation to identify significantly dysregulated pairwise interactions. The theory is seen as flexible and requiring only a mild set of assumptions | ||
520 | |a RESULTS: We apply our approach to gene expression data from both synthetic and real (breast and lung cancer) datasets comparing it also against several popular state-of-the-art methods. The analyses suggest the proposed algorithms can be used to obtain interpretable prediction rules with high prediction accuracies and to successfully extract significantly dysregulated gene-gene interactions from the data. They also compare favorably against their competitors across multiple synthetic data scenarios | ||
520 | |a CONCLUSION: The proposed HDMR-based approach appears to produce a reliable classifier that additionally allows one to describe how individual genes or gene-gene interactions affect classification decisions. Both real and synthetic data analyses suggest that our methods can be used to identify gene networks with dysregulated pairwise interactions, and are therefore appropriate for differential networks analysis | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Classification | |
650 | 4 | |a Disease prediction | |
650 | 4 | |a Expression analysis | |
650 | 4 | |a High dimensional model representation | |
650 | 4 | |a Log-likelihood ratio | |
700 | 1 | |a Pietrzak, Maciej |e verfasserin |4 aut | |
700 | 1 | |a Dalton, Lori A |e verfasserin |4 aut | |
700 | 1 | |a Rempała, Grzegorz A |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t BMC bioinformatics |d 2000 |g 21(2020), 1 vom: 25. Apr., Seite 156 |w (DE-627)NLM109215982 |x 1471-2105 |7 nnns |
773 | 1 | 8 | |g volume:21 |g year:2020 |g number:1 |g day:25 |g month:04 |g pages:156 |
856 | 4 | 0 | |u http://dx.doi.org/10.1186/s12859-020-3486-x |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 21 |j 2020 |e 1 |b 25 |c 04 |h 156 |