Hostname: page-component-78c5997874-mlc7c Total loading time: 0 Render date: 2024-11-15T13:23:19.307Z Has data issue: false hasContentIssue false

Average optimality for Markov decision processes in borel spaces: a new condition and approach

Published online by Cambridge University Press:  14 July 2016

Xianping Guo*
Affiliation:
Zhongshan University
Quanxin Zhu*
Affiliation:
South China Normal University
*
Postal address: School of Mathematics and Computational Science, Zhongshan University, Guangzhou, 510275, PR China. Email address: [email protected]
∗∗Postal address: Department of Mathematics, South China Normal University, Guangzhou, 510631, PR China. Email address: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

In this paper we study discrete-time Markov decision processes with Borel state and action spaces. The criterion is to minimize average expected costs, and the costs may have neither upper nor lower bounds. We first provide two average optimality inequalities of opposing directions and give conditions for the existence of solutions to them. Then, using the two inequalities, we ensure the existence of an average optimal (deterministic) stationary policy under additional continuity-compactness assumptions. Our conditions are slightly weaker than those in the previous literature. Also, some new sufficient conditions for the existence of an average optimal stationary policy are imposed on the primitive data of the model. Moreover, our approach is slightly different from the well-known ‘optimality inequality approach’ widely used in Markov decision processes. Finally, we illustrate our results in two examples.

Type
Research Papers
Copyright
© Applied Probability Trust 2006 

Footnotes

Partially supported by the NSFC, the NCET, and the RFDP.

References

Altman, E., Hordijk, A. and Spieksma, F. M. (1979). Contraction conditions for average and α-discount optimality in countable state Markov games with unbounded rewards. Math. Operat. Res. 22, 588618.Google Scholar
Arapostathis, A. et al. (1993). Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optimization 31, 282344.Google Scholar
Borkar, V. S. (1989). Control of Markov chains with long-run average cost criterion: the dynamic programming equations. SIAM J. Control Optimization 27, 642657.Google Scholar
Cavazos-Cadena, R. and Fernández-Gaucherand, E. (1996). Denumerable controlled Markov chains with strong average optimality criterion: bounded and unbounded costs. Math. Meth. Operat. Res. 43, 281300.CrossRefGoogle Scholar
Dekker, R. and Hordijk, A. (1988). Average, sensitive and Blackwell optimal policies in denumerable Markov decision chains with unbounded rewards. Math. Operat. Res. 13, 395420.Google Scholar
Derman, C. (1970). Finite State Markovian Decision Processes. Academic Press, New York.Google Scholar
Dynkin, E. B. and Yushkevich, A. A. (1979). Controlled Markov Processes. Springer, New York.Google Scholar
Gordienko, E. and Hernández-Lerma, O. (1995). Average cost Markov control processes with weighted norms: existence of canonical policies. Appl. Math. 23, 199218.Google Scholar
Guo, X. P. and Shi, P. (2001). Limiting average criteria for nonstationary Markov decision processes. SIAM J. Optimization 11, 10371053.Google Scholar
Guo, X. P., Liu, J. Y. and Liu, K. (2000). Nonhomogeneous Markov decision processes with Borel state space—the average criterion with nonuniformly bounded rewards. Math. Operat. Res. 25, 667678.Google Scholar
Guo, X. P., Shi, P. and Zhu, W. P. (2001). Strong average optimality for controlled nonhomogeneous Markov chains. Stoch. Anal. Appl. 19, 115134.Google Scholar
Hernández-Lerma, O. (1989). Adaptive Markov Control Processes. Springer, New York.Google Scholar
Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes. Basic Optimality Criteria. Springer, New York.Google Scholar
Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes. Springer, New York.Google Scholar
Hordijk, A. and Yushkevich, A. A. (1999). Blackwell optimality in the class of stationary policies in Markov decision chains with a Borel state space and unbounded rewards. Math. Meth. Operat. Res. 49, 139.Google Scholar
Hordijk, A. and Yushkevich, A. A. (1999). Blackwell optimality in the class of all policies in Markov decision chains with a Borel state space and unbounded rewards. Math. Meth. Operat. Res. 50, 421448.Google Scholar
Meyn, S. P. and Tweedie, R. L. (1994). Computable bounds for geometric convergence rates of Markov chains. Ann. Appl. Prob. 4, 9811011.Google Scholar
Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.Google Scholar
Ritt, R. K. and Sennott, L. I. (1992). Optimal stationary policies in general state space Markov decision chains with finite action sets. Math. Operat. Res. 17, 901909.Google Scholar
Robinson, D. R. (1976). Markov decision chains with unbounded costs and applications to the control of queues. Adv. Appl. Prob. 8, 159176.Google Scholar
Rolski, T., Schmidli, H., Schmidli, V. and Teugels, J. (1998). Stochastic Processes for Insurance and Finance. John Wiley, Chichester.Google Scholar
Ross, S. M. (1968). Arbitrary state Markovian decision processes. Ann. Math. Statist. 39, 21182122.Google Scholar
Scott, D. J. and Tweedie, R. L. (1996). Explicit rates of convergence of stochastically ordered Markov chains. In Athens Conference on Applied Probability and Time Series, Vol. 1, Applied Probability (Lecture Notes Statist. 114), eds Heyde, C. C. et al., Springer, Berlin, pp. 176191.Google Scholar
Sennott, L. I. (1999). Stochastic Dynamic Programming and the Control of Queueing Systems. John Wiley, New York.Google Scholar
Sennott, L. I. (2002). Average reward optimization theory for denumerable state spaces. In Handbook of Markov Decision Processes (Internat. Ser. Operat. Res. Manag. Sci. 40), eds Feinberg, E. A. and Shwartz, A., Kluwer, Boston, MA, pp. 153172.Google Scholar