Hostname: page-component-586b7cd67f-t7fkt Total loading time: 0 Render date: 2024-11-24T06:07:53.197Z Has data issue: false hasContentIssue false

Two-armed bandits with a goal, I. One arm known

Published online by Cambridge University Press:  01 July 2016

Donald A. Berry
Affiliation:
University of Minnesota
Bert Fristedt
Affiliation:
University of Minnesota

Abstract

One of two random variables, X and Y, can be selected at each of a possibly infinite number of stages. Depending on the outcome, one's fortune is either increased or decreased by 1. The probability of increase may not be known for either X or Y. The objective is to increase one's fortune to G before it decreases to g, for some integral g and G; either may be infinite.

In the current part of the paper, the distribution of X is unknown and that of Y is known. We characterize the situations in which optimal strategies exist and, for certain kinds of information concerning X and Y, we characterize optimal sequential strategies for choosing to observe X and Y.

In Part II (Berry and Fristedt (1980)), it is known that either X or Y has probability α of increasing the current fortune by 1 and the other has probability β of increasing the fortune by 1, where α and β are known, but which goes with X is not known.

Type
Research Article
Copyright
Copyright © Applied Probability Trust 1980 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Berry, D. A. (1972) A Bernoulli two-armed bandit. Ann. Math. Statist. 43, 871897.CrossRefGoogle Scholar
Berry, D. A. and Fristedt, B. (1979) Bernoulli one-armed bandits—arbitrary discount sequences. Ann. Statist. 7, 10861105.Google Scholar
Berry, D. A. and Fristedt, B. (1980) Two-armed bandits with a goal, II: Dependent arms. Adv. Appl. Prob. 12 (4).Google Scholar
Berry, D. A., Heath, D. C. and Sudderth, W. D. (1974) Red-and-black with unknown win probability. Ann. Statist. 2, 602608.Google Scholar
Dubins, L. E. and Savage, L. J. (1976) How to Gamble if You Must: Inequalities for Stochastic Processes. Dover, New York.Google Scholar
Dubins, L. E. and Sudderth, W. D. (1977) Countably additive gambling and optimal stopping. Z. Wahrscheinlichkeitsth. 41, 5972.Google Scholar
Sudderth, W. (1972) On the Dubins and Savage characterization of optimal strategies. Ann. Math. Statist. 43, 498507.Google Scholar