Hostname: page-component-586b7cd67f-t8hqh Total loading time: 0 Render date: 2024-11-24T08:09:59.434Z Has data issue: false hasContentIssue false

Average Optimality for Continuous-Time Markov Decision Processes Under Weak Continuity Conditions

Published online by Cambridge University Press:  30 January 2018

Yi Zhang*
Affiliation:
University of Liverpool
*
Postal address: Department of Mathematical Sciences, University of Liverpool, Liverpool L69 7ZL, UK. Email address: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

This paper considers the average optimality for a continuous-time Markov decision process in Borel state and action spaces, and with an arbitrarily unbounded nonnegative cost rate. The existence of a deterministic stationary optimal policy is proved under the conditions that allow the following; the controlled process can be explosive, the transition rates are weakly continuous, and the multifunction defining the admissible action spaces can be neither compact-valued nor upper semicontinuous.

Type
Research Article
Copyright
© Applied Probability Trust 

References

Berberian, S. K. (1999). Fundamentals of Real Analysis. Springer, New York.Google Scholar
Bertsekas, D. P. and Shreve, S. E. (1978). Stochastic Optimal Control. Academic Press, New York.Google Scholar
Cavazos-Cadena, R. (1991). A counterexample on the optimality equation in Markov decision chains with the average cost criterion. Syst. Control Lett. 16, 387392.CrossRefGoogle Scholar
Cavazos-Cadena, R. and Salem-Silva, F. (2010). The discunted method and equivalence of average criteria for risk-sensitive Markov decision processes on Borel spaces. Appl. Math. Optimization 61, 167190.Google Scholar
Costa, O. L. V. and Dufour, F. (2012). Average control of Markov decision processes with Feller transition probabilities and general acton spaces. J. Math. Anal. Appl. 396, 5869.Google Scholar
Feinberg, E. A. (2012). Reduction of discounted continuous-time MDPs with unbounded Jump and reward rates to discrete-time total-reward MDPs. In Optimization, Control, and Applications of Stochastic Systems, Birkhäuser, New York, pp. 7797.CrossRefGoogle Scholar
Feinberg, E. A. and Lewis, M. E. (2007). Optimality inequalities for average cost Markov decision processes and the stochastic cash balance problem. Math. Operat. Res. 32, 769783.Google Scholar
Feinberg, E. A., Kasyanov, P. O. and Zadoianchuk, N. V. (2012). Average cost Markov decision processes with weakly continuous transition probabilities. Math. Operat. Res. 37, 591607.CrossRefGoogle Scholar
Feinberg, E. A., Kasyanov, P. O. and Zadoianchuk, N. V. (2013). Berge's theorem for noncompact image sets. J. Math. Anal. Appl. 397, 255259.Google Scholar
Feinberg, E. A., Kasyanov, P. O. and Zadoianchuk, N. V. (2013). Fatou's lemma for weakly converging probabilities. Preprint, Department of Applied Mathematics and Statistics, State University of New York at Stony Brook. Available at http://arxiv.org/abs/1206.4073v2.Google Scholar
Feinberg, E. A., Mandava, M. and Shiryaev, A. N. (2014). On solutions of Kolmogorov's equations for nonhomogeneous Jump Markov processes. J. Math. Anal. Appl. 411, 261270.Google Scholar
Gíhman, I. Ī. and Skorohod, A. V. (1975). The Theory of Stochastic Processes. II. Springer, New York.Google Scholar
Guo, X. (2007). Continuous-time Markov decision processes with discounted rewards: the case of Polish spaces. Math. Operat. Res. 32, 7387.CrossRefGoogle Scholar
Guo, X. and Hernández-Lerma, O. (2003). Drift and monotonicity conditions for continuous-time controlled Markov chains with an average criterion. IEEE Trans. Automatic Control 48, 236245.Google Scholar
Guo, X. and Hernández-Lerma, O. (2009). Continuous-Time Markov Decision Processes: Theory and Applications. Springer, Berlin.CrossRefGoogle Scholar
Guo, X. and Liu, K. (2001). A note on optimality conditions for continuous-time Markov decision processes with average cost criterion. IEEE Trans. Automatic Control 46, 19841984.Google Scholar
Guo, X. and Rieder, U. (2006). Average optimality for continuous-time Markov decision processes in Polish spaces. Ann. Appl. Prob. 16, 730756.CrossRefGoogle Scholar
Guo, X. and Ye, L. (2010). New discount and average optimality conditions for continuous-time Markov decision processes. Adv. Appl. Prob. 42, 953985.CrossRefGoogle Scholar
Guo, X. and Zhang, Y. (2013). Generalized discounted continuous-time Markov decision processes. Preprint. Available at http://arxiv.org/abs/1304.3314.Google Scholar
Guo, X., Hernández-Lerma, O. and Prieto-Rumeau, T. (2006). A survey of recent results on continuous-time Markov decision processes. Top 14, 177261.Google Scholar
Guo, X., Huang, Y. and Song, X. (2012). Linear programming and constrained average optimality for general continuous-time Markov decision processes in history-dependent policies. SIAM J. Control Optimization 50, 2347.CrossRefGoogle Scholar
Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes. Springer, New York.Google Scholar
Hernández-Lerma, O. and Lasserre, J. B. (2000). Fatou's lemma and Lebesgue's convergence theorem for measures. J. Appl. Math. Stoch. Anal. 13, 137146.Google Scholar
Jaśkiewicz, A. (2009). Zero-sum ergodic semi-Markov games with weakly continuous transition probabilities. J. Optimization Theory Appl. 141, 321347.Google Scholar
Jaśkiewicz, A. and Nowak, A. S. (2006). On the optimality equation for average cost Markov control processes with Feller transition probabilities. J. Math. Anal. Appl. 316, 495509.Google Scholar
Jaśkiewicz, A. and Nowak, A. S. (2006). Optimality in Feller semi-Markov control processes. Operat. Res. Lett. 34, 713718.CrossRefGoogle Scholar
Kitaev, M. Yu. and Rykov, V. V. (1995). Controlled Queueing Systems. CRC, Boca Raton, FL.Google Scholar
Kitayev, M. Yu. (1986). Semi-Markov and Jump Markov controlled models: average cost criterion. Theory. Prob. Appl. 30, 272288.Google Scholar
Kuznetsov, S. E. (1981). Any Markov process in a Borel space has a transition function. Theory. Prob. Appl. 25, 384388.CrossRefGoogle Scholar
Piunovskiy, A. and Zhang, Y. (2011). Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J. Control Optimization 49, 20322061.CrossRefGoogle Scholar
Piunovskiy, A. and Zhang, Y. (2012). The transformation method for continuous-time Markov decision processes. J. Optimization Theory Appl. 154, 691712.CrossRefGoogle Scholar
Prieto-Rumeau, T. and Hernández-Lerma, O. (2012). Selected Topics on Continuous-time Controlled Markov Chains and Markov Games. Imperial College Press, London.Google Scholar
Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.Google Scholar
Zhu, Q. (2008). Average optimality for continuous-time Markov decision processes with a policy iteration approach. J. Math. Anal. Appl. 339, 691704.Google Scholar