Hostname: page-component-586b7cd67f-2brh9 Total loading time: 0 Render date: 2024-11-27T22:09:55.691Z Has data issue: false hasContentIssue false

On transforming an index for generalised bandit problems

Published online by Cambridge University Press:  14 July 2016

K. D. Glazebrook*
Affiliation:
University of Newcastle upon Tyne
S. Greatrix*
Affiliation:
University of Newcastle upon Tyne
*
Postal address: Department of Mathematics and Statistics, University of Newcastle upon Tyne, NE1 7RU, UK.
Postal address: Department of Mathematics and Statistics, University of Newcastle upon Tyne, NE1 7RU, UK.

Abstract

Nash (1980) demonstrated that index policies are optimal for a class of generalised bandit problem. A transform of the index concerned has many of the attributes of the Gittins index. The transformed index is positive-valued, with maximal values yielding optimal actions. It may be characterised as the value of a restart problem and is hence computable via dynamic programming methodologies. The transformed index can also be used in procedures for policy evaluation.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 1995 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Fay, N. A. and Glazebrook, K. D. (1987) On the scheduling of alternative stochastic jobs on a single machine. Adv. Appl. Prob. 19, 955973.Google Scholar
Fay, N. A. and Glazebrook, K. D. (1989) A general model for the scheduling of alternative tasks on a single machine. Proc. Eng. Inf. Sci. 3, 199221.CrossRefGoogle Scholar
Fay, N. A. and Walrand, J. C. (1991) On approximately optimal index strategies for generalised arm problems. J. Appl. Prob. 28, 602612.CrossRefGoogle Scholar
Gittins, J. C. (1979) Bandit processes and dynamic allocation indices (with discussion). J. R. Statist. Soc. B41, 148177.Google Scholar
Gittins, J. C. (1989) Multi-armed Bandit Allocation Indices. Wiley, Chichester.Google Scholar
Gittins, J. C. and Jones, D. M. (1974) A dynamic allocation index for the sequential design of experiments. In Progress in Statistics, ed. Gani, J. and Vince, I., 1, pp. 241266. North-Holland, Amsterdam.Google Scholar
Glazebrook, K. D. (1982) On the evaluation of suboptimal strategies for families of alternative bandit processes. J. Appl. Prob. 19, 716722.CrossRefGoogle Scholar
Glazebrook, K. D. (1983) Optimal strategies for families of alternative bandit processes. IEEE Trans. Autom Control 28, 858861.CrossRefGoogle Scholar
Glazebrook, K. D. (1990) Procedures for the evaluation of strategies for resource allocation in a stochastic environment. J. Appl. Prob. 27, 215220.CrossRefGoogle Scholar
Glazebrook, K. D. (1993) Indices for families of competing Markov decision processes with influence. Ann. Appl. Prob. 3, 10131032.CrossRefGoogle Scholar
Glazebrook, K. D. and Fay, N. A. (1988) Evaluating strategies for generalised bandit problems. Int. J. Syst. Sci. 19, 16051613.CrossRefGoogle Scholar
Glazebrook, K. D. and Greatrix, S. (1993) On scheduling influential stochastic tasks on a single machine. Eur. J. Operat. Res. 70, 405424.CrossRefGoogle Scholar
Glazebrook, K. D. and Owen, R. W. (1991) New results for generalised bandit processes. Int. J. Syst. Sci. 22, 479494.CrossRefGoogle Scholar
Katehakis, M. N. and Veinott, A. F. (1987) The multi-armed bandit problem: decomposition and computation. Math. Operat. Res. 12, 262268.CrossRefGoogle Scholar
Klimov, G. P. (1974) Time-sharing service systems I. Theory Prob. Appl. 19, 535551.Google Scholar
Nash, P. (1980) A generalised bandit problem. J. R. Statist. Soc. B42, 165169.Google Scholar
Robinson, D. R. (1982) Algorithms for evaluating the dynamic allocation index. Operat. Res. Lett. 1, 7274.CrossRefGoogle Scholar
Veinott, A. F. (1969) Discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist. 40, 16351660.CrossRefGoogle Scholar