Data-Driven Dynamic Multiobjective Optimal Control : An Aspiration-Satisfying Reinforcement Learning Approach
This article presents an iterative data-driven algorithm for solving dynamic multiobjective (MO) optimal control problems arising in control of nonlinear continuous-time systems. It is first shown that the Hamiltonian functional corresponding to each objective can be leveraged to compare the performance of admissible policies. Hamiltonian inequalities are then used for which their satisfaction guarantees satisfying the objectives' aspirations. Relaxed Hamilton-Jacobi-Bellman (HJB) equations in terms of HJB inequalities are then solved in a dynamic constrained MO framework to find Pareto optimal solutions. Relation to satisficing (good enough) decision-making framework is shown. A sum-of-square (SOS)-based iterative algorithm is developed to solve the formulated aspiration-satisfying MO optimization. To obviate the requirement of complete knowledge of the system dynamics, a data-driven satisficing reinforcement learning approach is proposed to solve the SOS optimization problem in real time using only the information of the system trajectories measured during a time interval without having full knowledge of the system dynamics. Finally, two simulation examples are utilized to verify the analytical results of the proposed algorithm.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2022 |
---|---|
Erschienen: |
2022 |
Enthalten in: |
Zur Gesamtaufnahme - volume:33 |
---|---|
Enthalten in: |
IEEE transactions on neural networks and learning systems - 33(2022), 11 vom: 29. Nov., Seite 6183-6193 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Mazouchi, Majid [VerfasserIn] |
---|
Links: |
---|
Themen: |
---|
Anmerkungen: |
Date Revised 28.10.2022 published: Print-Electronic Citation Status PubMed-not-MEDLINE |
---|
doi: |
10.1109/TNNLS.2021.3072571 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM324405359 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM324405359 | ||
003 | DE-627 | ||
005 | 20231225190243.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231225s2022 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1109/TNNLS.2021.3072571 |2 doi | |
028 | 5 | 2 | |a pubmed24n1081.xml |
035 | |a (DE-627)NLM324405359 | ||
035 | |a (NLM)33886483 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Mazouchi, Majid |e verfasserin |4 aut | |
245 | 1 | 0 | |a Data-Driven Dynamic Multiobjective Optimal Control |b An Aspiration-Satisfying Reinforcement Learning Approach |
264 | 1 | |c 2022 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Revised 28.10.2022 | ||
500 | |a published: Print-Electronic | ||
500 | |a Citation Status PubMed-not-MEDLINE | ||
520 | |a This article presents an iterative data-driven algorithm for solving dynamic multiobjective (MO) optimal control problems arising in control of nonlinear continuous-time systems. It is first shown that the Hamiltonian functional corresponding to each objective can be leveraged to compare the performance of admissible policies. Hamiltonian inequalities are then used for which their satisfaction guarantees satisfying the objectives' aspirations. Relaxed Hamilton-Jacobi-Bellman (HJB) equations in terms of HJB inequalities are then solved in a dynamic constrained MO framework to find Pareto optimal solutions. Relation to satisficing (good enough) decision-making framework is shown. A sum-of-square (SOS)-based iterative algorithm is developed to solve the formulated aspiration-satisfying MO optimization. To obviate the requirement of complete knowledge of the system dynamics, a data-driven satisficing reinforcement learning approach is proposed to solve the SOS optimization problem in real time using only the information of the system trajectories measured during a time interval without having full knowledge of the system dynamics. Finally, two simulation examples are utilized to verify the analytical results of the proposed algorithm | ||
650 | 4 | |a Journal Article | |
700 | 1 | |a Yang, Yongliang |e verfasserin |4 aut | |
700 | 1 | |a Modares, Hamidreza |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t IEEE transactions on neural networks and learning systems |d 2012 |g 33(2022), 11 vom: 29. Nov., Seite 6183-6193 |w (DE-627)NLM23236897X |x 2162-2388 |7 nnns |
773 | 1 | 8 | |g volume:33 |g year:2022 |g number:11 |g day:29 |g month:11 |g pages:6183-6193 |
856 | 4 | 0 | |u http://dx.doi.org/10.1109/TNNLS.2021.3072571 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 33 |j 2022 |e 11 |b 29 |c 11 |h 6183-6193 |