Hostname: page-component-78c5997874-t5tsf Total loading time: 0 Render date: 2024-11-08T18:29:52.820Z Has data issue: false hasContentIssue false

The performance of index-based policies for bandit problems with stochastic machine availability

Published online by Cambridge University Press:  01 July 2016

R. T. Dunn*
Affiliation:
University of Newcastle upon Tyne
K. D. Glazebrook*
Affiliation:
University of Newcastle upon Tyne
*
Postal address: Department of Statistics, University of Newcastle upon Tyne, Newcastle upon Tyne, NE1 7RU, UK.
∗∗ Email address: [email protected]

Abstract

We consider generalisations of two classical stochastic scheduling models, namely the discounted branching bandit and the discounted multi-armed bandit, to the case where the collection of machines available for processing is itself a stochastic process. Under rather mild conditions on the machine availability process we obtain performance guarantees for a range of controls based on Gittins indices. Various forms of asymptotic optimality are established for index-based limit policies as the discount rate approaches 0.

Type
General Applied Probability
Copyright
Copyright © Applied Probability Trust 2001 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bertsimas, D. and Niño-Mora, J. (1996). Conservation laws, extended polymatroids and multi-armed bandit problems: a polyhedral approach to indexable systems. Math. Operat. Res. 21, 257306.CrossRefGoogle Scholar
Birge, J., Frenk, J. B. G., Mittenthal, J. and Rinnooy Kan, A. H. G. (1990). Single machine scheduling subject to stochastic breakdowns. Naval Res. Logist. 37, 660677.3.0.CO;2-3>CrossRefGoogle Scholar
Blackwell, D. (1962). Discrete dynamic programming. Ann. Math. Statist. 33, 719726.CrossRefGoogle Scholar
Dacre, M., Glazebrook, K. D. and Niño-Mora, J. (1999). The achievable region approach to the optimal control of stochastic systems (with discussion). J. R. Statist. Soc. B 61, 747791.CrossRefGoogle Scholar
Denardo, E. V. and Miller, B. L. (1968). An optimality criterion for discrete dynamic programming with no discounting. Ann. Math. Statist. 39, 12201227.CrossRefGoogle Scholar
Gittins, J. C. and Jones, D. M. (1974). A dynamic allocation index for the sequential design of experiments. In Progress in Statistics (European Meeting of Statisticians, Budapest, 1972), eds Gani, J., Sarkadi, K. and Vince, I. North-Holland, Amsterdam, pp. 241266.Google Scholar
Glazebrook, K. D. (1976). Stochastic scheduling with order constraints. Int. J. Systems Sci. 7, 657666.CrossRefGoogle Scholar
Glazebrook, K. D. (1984). Scheduling stochastic jobs on a single machine subject to breakdowns. Naval Res. Logist. Quart. 31, 251264.CrossRefGoogle Scholar
Glazebrook, K. D. (1987). Evaluating the effects of machine breakdowns in stochastic scheduling problems. Naval Res. Logist. 34, 319335.3.0.CO;2-5>CrossRefGoogle Scholar
Glazebrook, K. D. and Garbe, R. (1999). Almost optimal policies for stochastic systems which almost satisfy conservation laws. Ann. Operat. Res. 92, 1943.CrossRefGoogle Scholar
Glazebrook, K. D. and Niño-Mora, J. (2001). Scheduling multiclass queueing networks on parallel servers: approximate and heavy-traffic optimality of Klimov's rule. To appear in Operat. Res. Google Scholar
Glazebrook, K. D. and Wilkinson, D. J. (2000). Index-based policies for discounted multi-armed bandits on parallel machines. Ann. Appl. Prob. 10, 877896.CrossRefGoogle Scholar
Katehakis, M. N. and Veinott, A. F. (1987). The multi-armed bandit problem: decomposition and computation. Math. Operat. Res. 12, 262268.CrossRefGoogle Scholar
Pinedo, M. and Rammouz, E. (1988). A note on stochastic scheduling on a single machine subject to breakdown and repair. Prob. Eng. Inf. Sci. 2, 4149.CrossRefGoogle Scholar
Puterman, M. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.CrossRefGoogle Scholar
Veinott, A. F. Jr. (1966). On finding optimal policies in discrete dynamic programming with no discounting. Ann. Math. Statist. 37, 12841294.CrossRefGoogle Scholar
Weber, R. R. (1982). Scheduling jobs with stochastic processing requirements on parallel machines to minimize makespan or flowtime. J. Appl. Prob. 19, 167182.CrossRefGoogle Scholar
Weber, R. R., Varaiya, P. and Walrand, J. (1986). Scheduling jobs with stochastically ordered processing times on parallel machines to minimize expected flowtime. J. Appl. Prob. 23, 841847.CrossRefGoogle Scholar
Weiss, G. (1988). Branching bandit processes. Prob. Eng. Inf. Sci. 2, 269278.CrossRefGoogle Scholar
Weiss, G. (1990). Approximation results in parallel machines stochastic scheduling. Ann. Operat. Res. 26, 195242.CrossRefGoogle Scholar
Weiss, G. (1992). Turnpike optimality of Smith's rule in parallel machines stochastic scheduling. Math. Operat. Res. 17, 255270.CrossRefGoogle Scholar
Weiss, G. (1995). On almost optimal priority rules for preemptive scheduling of stochastic jobs on parallel machines. Adv. Appl. Prob. 27, 821839.CrossRefGoogle Scholar
Whittle, P. (1981). Arm acquiring bandits. Ann. Prob. 9, 284292.CrossRefGoogle Scholar