Deep advantage learning for optimal dynamic treatment regime
Recently deep learning has successfully achieved state-of-the-art performance on many difficult tasks. Deep neural network outperforms many existing popular methods in the field of reinforcement learning. It can also identify important covariates automatically. Parameter sharing of convolutional neural network (CNN) greatly reduces the amount of parameters in the neural network, which allows for high scalability. However few research has been done on deep advantage learning (A-learning). In this paper, we present a deep A-learning approach to estimate optimal dynamic treatment regime. A-learning models the advantage function, which is of direct relevance to the goal. We use an inverse probability weighting (IPW) method to estimate the difference between potential outcomes, which does not require to make any model assumption on the baseline mean function. We implemented different architectures of deep CNN and convexified convolutional neural networks (CCNN). The proposed deep A-learning methods are applied to a data from the STAR*D trial and are shown to have better performance compared with the penalized least square estimator using a linear decision rule.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2018 |
---|---|
Erschienen: |
2018 |
Enthalten in: |
Zur Gesamtaufnahme - volume:2 |
---|---|
Enthalten in: |
Statistical theory and related fields - 2(2018), 1 vom: 12., Seite 80-88 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Liang, Shuhan [VerfasserIn] |
---|
Links: |
---|
Themen: |
Advantage Learning |
---|
Anmerkungen: |
Date Revised 03.04.2024 published: Print-Electronic Citation Status PubMed-not-MEDLINE |
---|
doi: |
10.1080/24754269.2018.1466096 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM290535425 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | NLM290535425 | ||
003 | DE-627 | ||
005 | 20240403232010.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231225s2018 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1080/24754269.2018.1466096 |2 doi | |
028 | 5 | 2 | |a pubmed24n1362.xml |
035 | |a (DE-627)NLM290535425 | ||
035 | |a (NLM)30420972 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Liang, Shuhan |e verfasserin |4 aut | |
245 | 1 | 0 | |a Deep advantage learning for optimal dynamic treatment regime |
264 | 1 | |c 2018 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Revised 03.04.2024 | ||
500 | |a published: Print-Electronic | ||
500 | |a Citation Status PubMed-not-MEDLINE | ||
520 | |a Recently deep learning has successfully achieved state-of-the-art performance on many difficult tasks. Deep neural network outperforms many existing popular methods in the field of reinforcement learning. It can also identify important covariates automatically. Parameter sharing of convolutional neural network (CNN) greatly reduces the amount of parameters in the neural network, which allows for high scalability. However few research has been done on deep advantage learning (A-learning). In this paper, we present a deep A-learning approach to estimate optimal dynamic treatment regime. A-learning models the advantage function, which is of direct relevance to the goal. We use an inverse probability weighting (IPW) method to estimate the difference between potential outcomes, which does not require to make any model assumption on the baseline mean function. We implemented different architectures of deep CNN and convexified convolutional neural networks (CCNN). The proposed deep A-learning methods are applied to a data from the STAR*D trial and are shown to have better performance compared with the penalized least square estimator using a linear decision rule | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Advantage Learning | |
650 | 4 | |a Convexified Convolutional Neural Networks | |
650 | 4 | |a Convolutional Neural Networks | |
650 | 4 | |a Dynamic Treatment Regime | |
650 | 4 | |a Inverse Probability Weighting | |
700 | 1 | |a Lu, Wenbin |e verfasserin |4 aut | |
700 | 1 | |a Song, Rui |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Statistical theory and related fields |d 2017 |g 2(2018), 1 vom: 12., Seite 80-88 |w (DE-627)NLM277501288 |x 2475-4277 |7 nnns |
773 | 1 | 8 | |g volume:2 |g year:2018 |g number:1 |g day:12 |g pages:80-88 |
856 | 4 | 0 | |u http://dx.doi.org/10.1080/24754269.2018.1466096 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 2 |j 2018 |e 1 |b 12 |h 80-88 |