Hostname: page-component-586b7cd67f-gb8f7 Total loading time: 0 Render date: 2024-11-27T19:37:01.819Z Has data issue: false hasContentIssue false

Open Bandit Processes with Uncountable States and Time-Backward Effects

Published online by Cambridge University Press:  30 January 2018

Xianyi Wu*
Affiliation:
East China Normal University and Macquarie University
Xian Zhou*
Affiliation:
Macquarie University
*
Postal address: Department of Statistics and Actuarial Science, East China Normal University, Shanghai, China.
∗∗ Postal address: Department of Applied Finance and Actuarial Studies, Macquarie University, Sydney, Australia. Email address: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Bandit processes and the Gittins index have provided powerful and elegant theory and tools for the optimization of allocating limited resources to competitive demands. In this paper we extend the Gittins theory to more general branching bandit processes, also referred to as open bandit processes, that allow uncountable states and backward times. We establish the optimality of the Gittins index policy with uncountably many states, which is useful in such problems as dynamic scheduling with continuous random processing times. We also allow negative time durations for discounting a reward to account for the present value of the reward that was received before the present time, which we refer to as time-backward effects. This could model the situation of offering bonus rewards for completing jobs above expectation. Moreover, we discover that a common belief on the optimality of the Gittins index in the generalized bandit problem is not always true without additional conditions, and provide a counterexample. We further apply our theory of open bandit processes with time-backward effects to prove the optimality of the Gittins index in the generalized bandit problem under a sufficient condition.

Type
Research Article
Copyright
© Applied Probability Trust 

Footnotes

This research was partially supported by the Natural Science Foundation of China, under grant number 71071056, and the Australian Research Council Discovery Project, under grant number DP1094153.

References

Bertsimas, D. and Niño-Mora, J. (1996). Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Math. Operat. Res. 21, 257306.Google Scholar
Crosbie, J. H. and Glazebrook, K. D. (2000). Index policies and a novel performance space structure for a class of generalised branching bandit problems. Math. Operat. Res. 25, 281297.Google Scholar
Denardo, E. V., Park, H. and Rothblum, U. G. (2007). Risk-sensitive and risk-neutral multiarmed bandits. Math. Operat. Res. 32, 374394.Google Scholar
Gittins, J. and Jones, D. (1974). A dynamic allocation index for the sequential design of experiments. In Progress in Statistics, ed. Gani, J., North-Holland, Amsterdam, pp. 241266.Google Scholar
Gittins, J., Glazebrook, K. and Weber, R. (2011). Multi-Armed Bandit Allocation Indices. John Wiley, Chichester.Google Scholar
Glazebrook, K. D. and Owen, R. W. (1991). New results for generalised bandit processes. Internat. J. Systems Sci. 22, 479494.CrossRefGoogle Scholar
Ishikida, T. and Varaiya, P. (1994). Multi-armed bandit problem revisited. J. Optimization Theory Appl. 83, 113154.Google Scholar
Lai, T. L. and Ying, Z. (1988). Open bandit processes and optimal scheduling of queueing networks. Adv. Appl. Prob. 20, 447472.Google Scholar
Nash, P. (1973). Optimal allocation of resources between research projects. Doctoral Thesis, Cambridge University.Google Scholar
Nash, P. (1980). A generalized bandit problem. J. R. Statist. Soc. B 42, 165169.Google Scholar
Sonin, I. M. (2008). A generalized Gittins index for a Markov chain and its recursive calculation. Statist. Prob. Lett. 78, 15261533.Google Scholar
Tsitsiklis, J. N. (1994). A short proof of the Gittins index theorem. Ann. Appl. Prob. 4, 194199.Google Scholar
Varaiya, P. P., Walrand, J. C. and Buyukkoc, C. (1985). Extensions of the multiarmed bandit problem: the discounted case. IEEE Trans. Automatic Control 30, 426439.Google Scholar
Weber, R. (1992). On the Gittins index for multiarmed bandits. Ann. Appl. Prob. 2, 10241033.Google Scholar
Weiss, G. (1988). Branching bandit processes. Prob. Eng. Inf. Sci. 2, 269278.CrossRefGoogle Scholar
Whittle, P. (1981). Arm-acquiring bandits. Ann. Prob. 9, 284292.Google Scholar