Multi-Armed Exponential Bandit

Exponential bandits are widely adopted in economics and marketing due to their tractability. This paper analyzes the one-agent multi-armed account of exponential bandits, where the agent dynamically selects arms to maximize total payoff. We motivate our base model by examples with arms being of the same type, while the results are generalized to cases where arms are either independent or dependent. The contribution is fourfold. First, we characterize the optimal policy for the agent to choose arms. Under the optimal policy, the agent selects one arm each time, and an arm is used at most once. Second, we show that the agent may not regard information acquisition as a last-ditch effort before quitting, which contradicts the existing literature. Third, with a discount factor, an arm may be used more than once. Fourth, for the case of negatively correlated bandits, the agent may use more than one arms simultaneously. The paper is of both theoretical and practical significance since the model fits well with various situations, including project selection, product promotion, and drug development. Implications for these applications are discussed.

Medienart:

E-Book

Erscheinungsjahr:

[2021]

Erschienen:

S.l.: SSRN ; 2021

Sprache:

Englisch

Beteiligte Personen:

Chen, Kanglin [VerfasserIn]
Chen, Ying-Ju [VerfasserIn]
Gallego, Guillermo [VerfasserIn]
Gao, Pin [VerfasserIn]
Liu, Haoyu [VerfasserIn]

Links:

ssrn.com [kostenfrei]
doi.org [kostenfrei]

Themen:

Multi-armed bandit

Anmerkungen:

Nach Informationen von SSRN wurde die ursprüngliche Fassung des Dokuments November 3, 2020 erstellt

Umfang:

1 Online-Ressource (36 p)

doi:

10.2139/ssrn.3724377

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

1806671220