Hostname: page-component-586b7cd67f-2brh9 Total loading time: 0 Render date: 2024-11-28T04:21:37.775Z Has data issue: false hasContentIssue false

Worth of perfect information in bernoulli bandits

Published online by Cambridge University Press:  01 July 2016

Donald A. Berry*
Affiliation:
University of Minnesota
Robert P. Kertz*
Affiliation:
Georgia Institute of Technology
*
Postal address: School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA. Supported in part by NSF Grants DMS 85-05023 and DMS 88-03087.
∗∗Postal address: School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332, USA.

Abstract

For k-armed Bernoulli bandits with discounting, sharp comparisons are given between average optimal rewards for a gambler and for a ‘perfectly informed' gambler, over natural collections of prior distributions. Some of these comparisons are proved under general discounting, and others under non-increasing discount sequences. Connections are made between these comparisons and the concept of ‘regret' in the minimax approach to bandit processes. Identification of extremal cases in the sharp comparisons is emphasized.

Type
Research Article
Copyright
Copyright © Applied Probability Trust 1991 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Supported in part by NSF Grants DMS 86-01153 and DMS 88-01818.

References

1. Bather, J. A. (1983) The minimax risk for the two-armed bandit problem in Mathematical Learning Models—Theory and Algorithms , eds. Herkenrath, U., Kalin, D., and Vogel, W., Springer-Verlag, New York, 111.Google Scholar
2. Berry, D. A. and Fristedt, B. (1985) Bandit Problems—Sequential Allocation of Experiments . Chapman and Hall, New York.Google Scholar
3. Cox, D. C. and Kertz, R. P. (1986) Prophet regions and sharp inequalities for pth absolute moments of martingales. J. Multivariate Anal. 18, 242273.Google Scholar
4. Feldman, D. (1962) Contributions to the ‘two-armed bandit’ problem. Ann. Math. Statist. 33, 847856.Google Scholar
5. Hildenbrand, W. (1974) Core and Equilibria of Large Economy. Princeton University Press.Google Scholar
6. Kertz, R. P. (1986) Stop rule and supremum expectations of i.i.d. random variables: a complete comparison by conjugate duality. J. Multivariate Anal. 19, 88112.Google Scholar
7. Kertz, R. P. (1986) Prophet problems in optimal stopping: results, techniques, and variations. Preprint, School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332.Google Scholar
8. Kolonko, M. and Benzing, H. (1985) On monotone optimal decision rules and the stay-on-a-winner rule for the two-armed bandit. Metrika 32, 395407.Google Scholar
9. Lai, T. L. and Robbins, H. (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6, 422.Google Scholar
10. Luenberger, D. G. (1969) Optimization by Vector Space Methods. Wiley, New York.Google Scholar
11. Quisel, K. (1965) Extensions of the two-armed bandit and related processes with on-line experimentation. Tech. Rep. No. 137, Institute for Mathematical Studies in the Social Sciences, Stanford Univ. Google Scholar
12. Raiffa, H. and Schlaiffer, R. (1961) Applied Statistical Decision Theory. Division of Research, Graduate School of Business Administration, Harvard University, Boston.Google Scholar
13. Rockafeller, R. T. (1970) Convex Analysis. Princeton University Press.Google Scholar
14. Rodman, L. (1978) On the many-armed bandit problem. Ann. Prob. 6, 491498.Google Scholar
15. Stoer, J. and Witzgall, C. (1970) Convexity and Optimization in Finite Dimensions I. Springer-Verlag, New York.Google Scholar
16. Vogel, W. (1960) An asymptotic minimax theorem for the two-armed bandit problem. Ann. Math. Statist. 31, 444451.Google Scholar
17. Zaborskis, A. A. (1976) Sequential Bayesian plan for choosing the best method of medical treatment. Avtomatika i Telemekhanika II, 144153.Google Scholar