Hostname: page-component-745bb68f8f-cphqk Total loading time: 0 Render date: 2025-01-28T03:37:08.708Z Has data issue: false hasContentIssue false

Monotone Policies and Indexability for Bidirectional Restless Bandits

Published online by Cambridge University Press:  04 January 2016

K. D. Glazebrook*
Affiliation:
Lancaster University
D. J. Hodge*
Affiliation:
The University of Nottingham
C. Kirkbride*
Affiliation:
Lancaster University
*
Postal address: Department of Management Science, Lancaster University, Lancaster, LA1 4YX, UK.
∗∗ Postal address: School of Mathematical Sciences, The University of Nottingham, Nottingham, NG7 2RD, UK.
Postal address: Department of Management Science, Lancaster University, Lancaster, LA1 4YX, UK.
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Motivated by a wide range of applications, we consider a development of Whittle's restless bandit model in which project activation requires a state-dependent amount of a key resource, which is assumed to be available at a constant rate. As many projects may be activated at each decision epoch as resource availability allows. We seek a policy for project activation within resource constraints which minimises an aggregate cost rate for the system. Project indices derived from a Lagrangian relaxation of the original problem exist provided the structural requirement of indexability is met. Verification of this property and derivation of the related indices is greatly simplified when the solution of the Lagrangian relaxation has a state monotone structure for each constituent project. We demonstrate that this is indeed the case for a wide range of bidirectional projects in which the project state tends to move in a different direction when it is activated from that in which it moves when passive. This is natural in many application domains in which activation of a project ameliorates its condition, which otherwise tends to deteriorate or deplete. In some cases the state monotonicity required is related to the structure of state transitions, while in others it is also related to the nature of costs. Two numerical studies demonstrate the value of the ideas for the construction of policies for dynamic resource allocation, most especially in contexts which involve a large number of projects.

Type
General Applied Probability
Copyright
© Applied Probability Trust 

References

Ansell, P., Glazebrook, K. D., Niño-Mora, J. and O'Keeffe, M. (2003). Whittle's index policy for a multi-class queueing system with convex holding costs. Math. Meth. Operat. Res. 57, 2139.Google Scholar
Archibald, T. W., Black, D. P. and Glazebrook, K. D. (2009). Indexability and index heuristics for a simple class of inventory routing problems. Operat. Res. 57, 314326.CrossRefGoogle Scholar
Dacre, M., Glazebrook, K. and Niño-Mora, J. (1999). The achievable region approach to the optimal control of stochastic systems (with discussion). J. R. Statist. Soc. B 61, 747791.Google Scholar
Gittins, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). J. R. Statist. Soc. B 41, 148177.Google Scholar
Gittins, J. C. (1989). Multi-Armed Bandit Allocation Indices. John Wiley, Chichester.Google Scholar
Glazebrook, K. D. and Minty, R. (2009). A generalized Gittins index for a class of multiarmed bandits with general resource requirements. Math. Operat. Res. 34, 2644.Google Scholar
Glazebrook, K. D., Kirkbride, C. and Ruiz-Hernandez, D. (2006). Spinning plates and squad systems: policies for bidirectional restless bandits. Adv. Appl. Prob. 38, 95115.CrossRefGoogle Scholar
Glazebrook, K. D., Niño-Mora, J. and Ansell, P. S. (2002). Index policies for a class of discounted restless bandits. Adv. Appl. Prob. 34, 754774.Google Scholar
Jacko, P. (2009). Marginal productivity index policies for dynamic priority allocation in restless bandit models. , Universidad Carlos III de Madrid.Google Scholar
Ny J., Le, Dahleh, M. and Feron, E. (2008). Multi-UAV dynamic routing with partial observations using restless bandit allocation indices. In 2008 American Control Conference, pp. 42204225.Google Scholar
Liu, K. and Zhao, Q. (2010). Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Trans. Inf. Theory 56, 55475567.CrossRefGoogle Scholar
Niño-Mora, J. (2001). Restless bandits, partial conservation laws and indexability. Adv. Appl. Prob. 33, 7698.CrossRefGoogle Scholar
Niño-Mora, J. (2007). Dynamic priority allocation via restless bandit marginal productivity indices. TOP 15, 161198.CrossRefGoogle Scholar
Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.Google Scholar
Weber, R. R. and Weiss, G. (1990). On an index policy for restless bandits. J. Appl. Prob. 27, 637648.Google Scholar
Weber, R. R. and Weiss, G. (1991). Addendum to ‘On an index policy for restless bandits’. Adv. Appl. Prob. 23, 429430.CrossRefGoogle Scholar
Whittle, P. (1988). Restless bandits: activity allocation in a changing world. In A Celebration of Applied Probability (J. Appl. Prob. Spec. Vol. 25A), ed. Gani, J., Applied Probability Trust, Sheffield, pp. 287298.Google Scholar
Whittle, P. (1996). Optimal Control. John Wiley, Chichester.Google Scholar