Hostname: page-component-cd9895bd7-q99xh Total loading time: 0 Render date: 2024-12-27T08:54:14.180Z Has data issue: false hasContentIssue false

ASYMPTOTIC BAYES ANALYSIS FOR THE FINITE-HORIZON ONE-ARMED-BANDIT PROBLEM

Published online by Cambridge University Press:  07 January 2003

Apostolos N. Burnetas
Affiliation:
Department of Operations, Weatherhead School of Management, Case Western Reserve University, Cleveland, OH 44106-7235, E-mail: [email protected]
Michael N. Katehakis
Affiliation:
Department of Management Science and Information Systems, Rutgers Business School, Rutgers—The State University of New Jersey, Newark, NJ 07102, E-mail: [email protected]

Abstract

The multiarmed-bandit problem is often taken as a basic model for the trade-off between the exploration and utilization required for efficient optimization under uncertainty. In this article, we study the situation in which the unknown performance of a new bandit is to be evaluated and compared with that of a known one over a finite horizon. We assume that the bandits represent random variables with distributions from the one-parameter exponential family. When the objective is to maximize the Bayes expected sum of outcomes over a finite horizon, it is shown that optimal policies tend to simple limits when the length of the horizon is large.

Type
Research Article
Copyright
© 2003 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)