Hostname: page-component-745bb68f8f-s22k5 Total loading time: 0 Render date: 2025-01-11T10:48:35.581Z Has data issue: false hasContentIssue false

Optimal stopping and dynamic allocation

Published online by Cambridge University Press:  01 July 2016

Fu Chang*
Affiliation:
AT & T Bell Laboratories
Tze Leung Lai*
Affiliation:
Columbia University
*
Postal address: AT & T Bell Laboratories, Crawfords Corner Road, Holmdel, NJ 07733, USA.
∗∗Postal address: Dept. of Statistics, Box 10 Mathematics, Columbia University, New York, NY 10027, USA.

Abstract

A class of optimal stopping problems for the Wiener process is studied herein, and asymptotic expansions for the optimal stopping boundaries are derived. These results lead to a simple index-type class of asymptotically optimal solutions to the classical discounted multi-armed bandit problem: given a discount factor 0<β <1 and k populations with densities from an exponential family, how should x1, x2,… be sampled sequentially from these populations to maximize the expected value of Ʃ1 βi−1xi, in ignorance of the parameters of the densities?

Type
Research Article
Copyright
Copyright © Applied Probability Trust 1987 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Research supported by the National Science Foundation and the Army Research Office.

References

Bather, J. A. (1970) Optimal stopping of Brownian motion. Adv. Appl. Prob. 2, 259286.CrossRefGoogle Scholar
Bather, J. A. (1983) Optimal stopping of Brownian motion: a comparison technique. In Recent Advances in Statistics, ed. Rustagi, J. et al., Academic Press, New York, 1950.CrossRefGoogle Scholar
Bellman, R. (1956) A problem in the sequential design of experiments. Sankhya A 16, 221229.Google Scholar
Chernoff, H. (1965) Sequential tests for the mean of a normal distribution III (small t). Ann. Math. Statist. 36, 2854.CrossRefGoogle Scholar
Chernoff, H. (1968) Optimal stochastic control. Sankhya A 30, 221252.Google Scholar
Chernoff, H. and Ray, S. N. (1965) A Bayes sequential sampling inspection plan. Ann Math. Statist. 36, 13871407.CrossRefGoogle Scholar
Ferguson, T. (1967) Mathematical Statistics—A Decision Theoretic Approach. Academic Press, New York.Google Scholar
Gittins, J. C. (1979) Bandit processes and dynamic allocation indices. J. R. Statist Soc. B 41, 148177.Google Scholar
Gittins, J. C. (1983) Dynamic allocation indices for Bayesian bandits. In Mathematical Learning Models—Theory and Algorithms, ed. Herkenrath, U. et al., Springer-Verlag, Berlin, 5067.CrossRefGoogle Scholar
Gittins, J. C. and Jones, D. M. (1974) A dynamic allocation index for the sequential design of experiments. In Progress in Statistics, ed. Gani, J. et al., North Holland, Amsterdam, 241266.Google Scholar
Gittins, J. C. and Jones, D. M. (1979) A dynamic allocation index for the discounted multi-armed bandit problem. Biometrika 66, 561565.CrossRefGoogle Scholar
Kelly, F. P. (1981) Multi-armed bandits with discount factor near one: the Bernoulli case. Ann. Statist. 9, 9871001.CrossRefGoogle Scholar
Lai, T. L. (1987) Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15.CrossRefGoogle Scholar
Van Moerbeke, P. (1976) On optimal stopping and free boundary problems. Arch. Rat. Mech. Anal. 60, 101148.CrossRefGoogle Scholar
Whittle, P. (1980) Multi-armed bandits and the Gittins index. J. R. Statist. Soc. B 42, 143149.Google Scholar