Hostname: page-component-cd9895bd7-gbm5v Total loading time: 0 Render date: 2024-12-27T09:30:30.505Z Has data issue: false hasContentIssue false

INDEXABILITY AND OPTIMAL INDEX POLICIES FOR A CLASS OF REINITIALISING RESTLESS BANDITS

Published online by Cambridge University Press:  16 October 2015

Sofía S. Villar*
Affiliation:
BCAM, Basque Center for Applied Mathematics and MRC Biostatistics Unit, IPH, University Forvie Site, Robinson Way, Cambridge CB2 0SR, UKandLancaster University E-mail: [email protected]

Abstract

Motivated by a class of Partially Observable Markov Decision Processes with application in surveillance systems in which a set of imperfectly observed state processes is to be inferred from a subset of available observations through a Bayesian approach, we formulate and analyze a special family of multi-armed restless bandit problems. We consider the problem of finding an optimal policy for observing the processes that maximizes the total expected net rewards over an infinite time horizon subject to the resource availability. From the Lagrangian relaxation of the original problem, an index policy can be derived, as long as the existence of the Whittle index is ensured. We demonstrate that such a class of reinitializing bandits in which the projects' state deteriorates while active and resets to its initial state when passive until its completion possesses the structural property of indexability and we further show how to compute the index in closed form. In general, the Whittle index rule for restless bandit problems does not achieve optimality. However, we show that the proposed Whittle index rule is optimal for the problem under study in the case of stochastically heterogenous arms under the expected total criterion, and it is further recovered by a simple tractable rule referred to as the 1-limited Round Robin rule. Moreover, we illustrate the significant suboptimality of other widely used heuristic: the Myopic index rule, by computing in closed form its suboptimality gap. We present numerical studies which illustrate for the more general instances the performance advantages of the Whittle index rule over other simple heuristics.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2015 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1.Ahmad, S., Liu, M., Javidi, T., Zhao, Q., & Krishnamachari, B. (2009). Optimality of myopic sensing in multichannel opportunistic access. IEEE Transactions on Information Theory 55(9): 40404050.CrossRefGoogle Scholar
2.Gittins, J.C. & Jones, D.M. (1974). A dynamic allocation index for the sequential design of experiments. In Gani, J. (ed.), Progress in Statistics. North–Holland, Amsterdam, pp. 241266.Google Scholar
3.Gittins, J.C. & Jones, D.M. (1979). Bandit processes and dynamic allocation indices (with discussion). Journal of the Royal Statistical Society B 41: 148177.Google Scholar
4.Glazebrook, K.D., Ruiz-Hernandez, D., & Kirkbride, C. (2006). Some indexable families of restless bandit problems. Advances in Applied Probability 38(3): 643672.CrossRefGoogle Scholar
5.Glazebrook, K.D., Hodge, D.J., & Kirkbride, C. (2013). Monotone policies and indexability for bidirectional restless bandits. Advances in Applied Probability 45(1): 5185.CrossRefGoogle Scholar
6.Jacko, P. (2011). Optimal index rules for single resource allocation to stochastic dynamic competitors. Proceedings of ValueTools, pp. 425433.Google Scholar
7.Jacko, P. & Sanso, B. (2012). Optimal anticipative congestion control of flows with time-varying input stream Performance Evaluation 69(2): 86101.CrossRefGoogle Scholar
8.Jacko, P. & Villar, S.S. (2012). Opportunistic schedulers for optimal scheduling of flows in wireless systems with ARQ Feedback. 24th International Teletraffic Conference (ITC) IEEE, pp. 1–8.Google Scholar
9.Kreucher, C., Blatt, D., Hero, A., & Kastella, K. (2006). Adaptive multi-modality sensor scheduling for detection and tracking of smart targets. Digital Signal Processing, 16(5): 546567.CrossRefGoogle Scholar
10.Liu, K. & Zhao, Q. (2010). Indexability of restless bandit problems and optimality of whittle index for dynamic multichannel access. IEEE Transactions on Information Theory 56(11): 55475567.CrossRefGoogle Scholar
11.Liu, K., Weber, R., & Zhao, Q. (2011). Indexability and whittle index for restless bandit problems involving reset processes. In proceedings of the 50th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC), pp. 7690–7696.CrossRefGoogle Scholar
12.Lovejoy, W.S. (1991). Computationally feasible bounds for partially observed Markov decision processes, Operations Research 39(1): 162175.CrossRefGoogle Scholar
13.Mansourifard, P., Javidi, T., & Krishnamachari, B. (2012). Optimality of myopic policy for a class of monotone affine restless multi-armed bandits. In the Proceedings of the 51th IEEE International Conference on Decision and Control (CDC).CrossRefGoogle Scholar
14.Niño-Mora, J. (2001). Restless bandits, partial conservation laws and indexability. Advances in Applied Probability 33(1): 7698.CrossRefGoogle Scholar
15.Niño-Mora, J. (2007). Dynamic priority allocation via restless bandit marginal productivity indices. Top 15(2): 161198.CrossRefGoogle Scholar
16.Niño-Mora, J. & Villar, S.S. (2011). Sensor scheduling for hunting elusive hiding targets via Whittle's Restless Bandit Index Policy. 5th International Conference on Network Games, Control and Optimization (NetGCooP), pp. 1–8.Google Scholar
17.Papadimitriou, C.H. & Tsitsiklis, J.N. (1999). The complexity of optimal queuing network control. Mathematics of Operations Research 24(2): 293305.CrossRefGoogle Scholar
18.Sondik, E.J. (1978). The optimal control of partially observable Markov processes over the infinite horizon: discounted costs. Operations Research 26(2): 282304.CrossRefGoogle Scholar
19.Villar, S.S. (2012). Restless bandit models for sensor management. LAP LAMBERT Academic Publishing, Germany.Google Scholar
20.Weber, R.R. & Weiss, G. (1990). On an index policy for restless bandits. Journal of Applied Probability 27(3): 637648.CrossRefGoogle Scholar
21.Whittle, P. (1988). Restless bandits: Activity allocation in a changing world. Journal of Applied Probability 25: 287298.CrossRefGoogle Scholar
22.Zhao, Q., Krishnamachari, B., & Liu, K. (2008). On myopic sensing for multi-channel opportunistic access: Structure, optimality, and performance. IEEE Transactions on Wireless Communications 7(12): 54315440.CrossRefGoogle Scholar
23.Mathpages, Algebra, Linear Fractional Transformations. Available from http://www.mathpages.com/home/kmath464/kmath464.htmGoogle Scholar