Open Bandit Processes with Uncountable States and Time-Backward Effects

Xianyi Wu; Xian Zhou

doi:10.1239/jap/1371648948

Open Bandit Processes with Uncountable States and Time-Backward Effects

Part of: Operations research and management science Stochastic processes

Published online by Cambridge University Press: 30 January 2018

Xianyi Wu and

Xian Zhou

Show author details

Xianyi Wu*: Affiliation:
East China Normal University and Macquarie University
Xian Zhou*: Affiliation:
Macquarie University
*: ∗ Postal address: Department of Statistics and Actuarial Science, East China Normal University, Shanghai, China.
∗∗ Postal address: Department of Applied Finance and Actuarial Studies, Macquarie University, Sydney, Australia. Email address: [email protected]

Article contents

Abstract
Footnotes
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Bandit processes and the Gittins index have provided powerful and elegant theory and tools for the optimization of allocating limited resources to competitive demands. In this paper we extend the Gittins theory to more general branching bandit processes, also referred to as open bandit processes, that allow uncountable states and backward times. We establish the optimality of the Gittins index policy with uncountably many states, which is useful in such problems as dynamic scheduling with continuous random processing times. We also allow negative time durations for discounting a reward to account for the present value of the reward that was received before the present time, which we refer to as time-backward effects. This could model the situation of offering bonus rewards for completing jobs above expectation. Moreover, we discover that a common belief on the optimality of the Gittins index in the generalized bandit problem is not always true without additional conditions, and provide a counterexample. We further apply our theory of open bandit processes with time-backward effects to prove the optimality of the Gittins index in the generalized bandit problem under a sufficient condition.

Keywords

Open bandit process generalized bandit process Gittins index priority scheduling

MSC classification

Primary: 90B36: Scheduling theory, stochastic 60G40: Stopping times; optimal stopping problems; gambling theory 90C40: Markov and semi-Markov decision processes

Type: Research Article
Information: Journal of Applied Probability , Volume 50 , Issue 2 , June 2013 , pp. 388 - 402

DOI: https://doi.org/10.1239/jap/1371648948 [Opens in a new window]
Copyright: © Applied Probability Trust

Footnotes

This research was partially supported by the Natural Science Foundation of China, under grant number 71071056, and the Australian Research Council Discovery Project, under grant number DP1094153.

References

Bertsimas, D. and Niño-Mora, J. (1996). Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Math. Operat. Res. 21, 257–306.Google Scholar

Crosbie, J. H. and Glazebrook, K. D. (2000). Index policies and a novel performance space structure for a class of generalised branching bandit problems. Math. Operat. Res. 25, 281–297.Google Scholar

Denardo, E. V., Park, H. and Rothblum, U. G. (2007). Risk-sensitive and risk-neutral multiarmed bandits. Math. Operat. Res. 32, 374–394.Google Scholar

Gittins, J. and Jones, D. (1974). A dynamic allocation index for the sequential design of experiments. In Progress in Statistics, ed. Gani, J., North-Holland, Amsterdam, pp. 241–266.Google Scholar

Gittins, J., Glazebrook, K. and Weber, R. (2011). Multi-Armed Bandit Allocation Indices. John Wiley, Chichester.Google Scholar

Glazebrook, K. D. and Owen, R. W. (1991). New results for generalised bandit processes. Internat. J. Systems Sci. 22, 479–494.CrossRef Google Scholar

Ishikida, T. and Varaiya, P. (1994). Multi-armed bandit problem revisited. J. Optimization Theory Appl. 83, 113–154.Google Scholar

Lai, T. L. and Ying, Z. (1988). Open bandit processes and optimal scheduling of queueing networks. Adv. Appl. Prob. 20, 447–472.Google Scholar

Nash, P. (1973). Optimal allocation of resources between research projects. Doctoral Thesis, Cambridge University.Google Scholar

Nash, P. (1980). A generalized bandit problem. J. R. Statist. Soc. B 42, 165–169.Google Scholar

Sonin, I. M. (2008). A generalized Gittins index for a Markov chain and its recursive calculation. Statist. Prob. Lett. 78, 1526–1533.Google Scholar

Tsitsiklis, J. N. (1994). A short proof of the Gittins index theorem. Ann. Appl. Prob. 4, 194–199.Google Scholar

Varaiya, P. P., Walrand, J. C. and Buyukkoc, C. (1985). Extensions of the multiarmed bandit problem: the discounted case. IEEE Trans. Automatic Control 30, 426–439.Google Scholar

Weber, R. (1992). On the Gittins index for multiarmed bandits. Ann. Appl. Prob. 2, 1024–1033.Google Scholar

Weiss, G. (1988). Branching bandit processes. Prob. Eng. Inf. Sci. 2, 269–278.CrossRef Google Scholar

Whittle, P. (1981). Arm-acquiring bandits. Ann. Prob. 9, 284–292.Google Scholar

Article contents

Open Bandit Processes with Uncountable States and Time-Backward Effects

Abstract

Keywords

MSC classification

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests