Details der Publikation - Brain signals of a Surprise-Actor-Critic model

Brain signals of a Surprise-Actor-Critic model : Evidence for multiple learning modules in human decision making

Copyright © 2021. Published by Elsevier Inc..

Learning how to reach a reward over long series of actions is a remarkable capability of humans, and potentially guided by multiple parallel learning modules. Current brain imaging of learning modules is limited by (i) simple experimental paradigms, (ii) entanglement of brain signals of different learning modules, and (iii) a limited number of computational models considered as candidates for explaining behavior. Here, we address these three limitations and (i) introduce a complex sequential decision making task with surprising events that allows us to (ii) dissociate correlates of reward prediction errors from those of surprise in functional magnetic resonance imaging (fMRI); and (iii) we test behavior against a large repertoire of model-free, model-based, and hybrid reinforcement learning algorithms, including a novel surprise-modulated actor-critic algorithm. Surprise, derived from an approximate Bayesian approach for learning the world-model, is extracted in our algorithm from a state prediction error. Surprise is then used to modulate the learning rate of a model-free actor, which itself learns via the reward prediction error from model-free value estimation by the critic. We find that action choices are well explained by pure model-free policy gradient, but reaction times and neural data are not. We identify signatures of both model-free and surprise-based learning signals in blood oxygen level dependent (BOLD) responses, supporting the existence of multiple parallel learning modules in the brain. Our results extend previous fMRI findings to a multi-step setting and emphasize the role of policy gradient and surprise signalling in human learning.

Medienart:	E-Artikel

Erscheinungsjahr:	2022
Erschienen:	2022

Enthalten in:	Zur Gesamtaufnahme - volume:246
Enthalten in:	NeuroImage - 246(2022) vom: 01. Feb., Seite 118780

Sprache:	Englisch

Beteiligte Personen:	Liakoni, Vasiliki [VerfasserIn] Lehmann, Marco P [VerfasserIn] Modirshanechi, Alireza [VerfasserIn] Brea, Johanni [VerfasserIn] Lutti, Antoine [VerfasserIn] Gerstner, Wulfram [VerfasserIn] Preuschoff, Kerstin [VerfasserIn]

Links:	Volltext

Themen:	Behavior FMRI Human learning Journal Article Reinforcement learning Research Support, Non-U.S. Gov't Sequential decision making Surprise

Anmerkungen:	Date Completed 21.02.2022 Date Revised 21.02.2022 published: Print-Electronic Citation Status MEDLINE

doi:	10.1016/j.neuroimage.2021.118780

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	NLM334120195

Internformat


LEADER	01000naa a22002652 4500
001	NLM334120195
003	DE-627
005	20231225223037.0
007	cr uuu---uuuuu
008	231225s2022 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1016/j.neuroimage.2021.118780 \|2 doi
028	5	2	\|a pubmed24n1113.xml
035			\|a (DE-627)NLM334120195
035			\|a (NLM)34875383
035			\|a (PII)S1053-8119(21)01052-1
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Liakoni, Vasiliki \|e verfasserin \|4 aut
245	1	0	\|a Brain signals of a Surprise-Actor-Critic model \|b Evidence for multiple learning modules in human decision making
264		1	\|c 2022
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 21.02.2022
500			\|a Date Revised 21.02.2022
500			\|a published: Print-Electronic
500			\|a Citation Status MEDLINE
520			\|a Copyright © 2021. Published by Elsevier Inc.
520			\|a Learning how to reach a reward over long series of actions is a remarkable capability of humans, and potentially guided by multiple parallel learning modules. Current brain imaging of learning modules is limited by (i) simple experimental paradigms, (ii) entanglement of brain signals of different learning modules, and (iii) a limited number of computational models considered as candidates for explaining behavior. Here, we address these three limitations and (i) introduce a complex sequential decision making task with surprising events that allows us to (ii) dissociate correlates of reward prediction errors from those of surprise in functional magnetic resonance imaging (fMRI); and (iii) we test behavior against a large repertoire of model-free, model-based, and hybrid reinforcement learning algorithms, including a novel surprise-modulated actor-critic algorithm. Surprise, derived from an approximate Bayesian approach for learning the world-model, is extracted in our algorithm from a state prediction error. Surprise is then used to modulate the learning rate of a model-free actor, which itself learns via the reward prediction error from model-free value estimation by the critic. We find that action choices are well explained by pure model-free policy gradient, but reaction times and neural data are not. We identify signatures of both model-free and surprise-based learning signals in blood oxygen level dependent (BOLD) responses, supporting the existence of multiple parallel learning modules in the brain. Our results extend previous fMRI findings to a multi-step setting and emphasize the role of policy gradient and surprise signalling in human learning
650		4	\|a Journal Article
650		4	\|a Research Support, Non-U.S. Gov't
650		4	\|a Behavior
650		4	\|a Human learning
650		4	\|a Reinforcement learning
650		4	\|a Sequential decision making
650		4	\|a Surprise
650		4	\|a fMRI
700	1		\|a Lehmann, Marco P \|e verfasserin \|4 aut
700	1		\|a Modirshanechi, Alireza \|e verfasserin \|4 aut
700	1		\|a Brea, Johanni \|e verfasserin \|4 aut
700	1		\|a Lutti, Antoine \|e verfasserin \|4 aut
700	1		\|a Gerstner, Wulfram \|e verfasserin \|4 aut
700	1		\|a Preuschoff, Kerstin \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t NeuroImage \|d 1992 \|g 246(2022) vom: 01. Feb., Seite 118780 \|w (DE-627)NLM09001443X \|x 1095-9572 \|7 nnns
773	1	8	\|g volume:246 \|g year:2022 \|g day:01 \|g month:02 \|g pages:118780
856	4	0	\|u http://dx.doi.org/10.1016/j.neuroimage.2021.118780 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a GBV_NLM
951			\|a AR
952			\|d 246 \|j 2022 \|b 01 \|c 02 \|h 118780

Brain signals of a Surprise-Actor-Critic model : Evidence for multiple learning modules in human decision making

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände