Finite-horizon optimality for continuous-time Markov decision processes with unbounded transition rates

Xianping Guo; Xiangxiang Huang; Yonghui Huang

doi:10.1239/aap/1449859800

Finite-horizon optimality for continuous-time Markov decision processes with unbounded transition rates

Part of: Stochastic systems and control Markov processes

Published online by Cambridge University Press: 21 March 2016

Xianping Guo ,

Xiangxiang Huang and

Yonghui Huang

Show author details

Xianping Guo*: Affiliation:
Sun Yat-Sen University
Xiangxiang Huang*: Affiliation:
Sun Yat-Sen University
Yonghui Huang*: Affiliation:
Sun Yat-Sen University
*: ∗ Postal address: School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, 510275, P. R. China.
∗ Postal address: School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, 510275, P. R. China.
∗ Postal address: School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, 510275, P. R. China.

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

In this paper we focus on the finite-horizon optimality for denumerable continuous-time Markov decision processes, in which the transition and reward/cost rates are allowed to be unbounded, and the optimality is over the class of all randomized history-dependent policies. Under mild reasonable conditions, we first establish the existence of a solution to the finite-horizon optimality equation by designing a technique of approximations from the bounded transition rates to unbounded ones. Then we prove the existence of ε (≥ 0)-optimal Markov policies and verify that the value function is the unique solution to the optimality equation by establishing the analog of the Itô-Dynkin formula. Finally, we provide an example in which the transition rates and the value function are all unbounded and, thus, obtain solutions to some of the unsolved problems by Yushkevich (1978).

Keywords

Continuous-time Markov decision process finite-horizon criterion optimal Markov policy randomized history-dependent policy unbounded transition rate

MSC classification

Primary: 90C40: Markov and semi-Markov decision processes

Secondary: 93E20: Optimal stochastic control 60J27: Continuous-time Markov processes on discrete state spaces

Type: General Applied Probability
Information: Advances in Applied Probability , Volume 47 , Issue 4 , December 2015 , pp. 1064 - 1087

DOI: https://doi.org/10.1239/aap/1449859800 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 2015

References

Baüerle, N. and Rieder, U. (2011). Markov Decision Processes with Applications to Finance. Springer, Heidelberg.Google Scholar

Bertsekas, D. P. and Shreve, S. E. (1978). Stochastic Optimal Control. The Discrete Time Case. Academic Press, New York.Google Scholar

Feinberg, E. A. (2004). Continuous time discounted jump Markov decision processes: a discrete-event approach. Math. Operat. Res. 29, 492–524.CrossRef Google Scholar

Feinberg, E. A. (2012). Reduction of discounted continuous-time MDPs with unbounded jump and reward rates to discrete-time total-reward MDPs. In Optimization, Control, and Applications of Stochastic Systems. Birkhäuser, New York, pp. 77–97.CrossRef Google Scholar

Feinberg, E. A., Mandava, M. and Shiryaev, A. N. (2014). On solutions of Kolmogorov's equations for nonhomogeneous jump Markov processes. J. Math. Anal. Appl. 411, 261–270.Google Scholar

Ghosh, M. K. and Saha, S. (2012). Continuous-time controlled jump Markov processes on the finite horizon. In Optimization, Control, and Applications of Stochastic Systems. Birkhäuser, New York, pp. 99–109.CrossRef Google Scholar

Guo, X. (2007). Continuous-time Markov decision processes with discounted rewards: the case of Polish spaces. Math. Operat. Res. 32, 73–87.Google Scholar

Guo, X. and Hernández-Lerma, O. (2009). Continuous-Time Markov Decision Processes. Springer, Berlin.Google Scholar

Guo, X. and Piunovskiy, A. (2011). Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math. Operat. Res. 36, 105–132.Google Scholar

Guo, X. and Ye, L. (2010). New discount and average optimality conditions for continuous-time Markov decision processes. Adv. Appl. Prob. 42, 953–985.Google Scholar

Guo, X., Hernández-Lerma, O. and Prieto-Rumeau, T. (2006). A survey of recent results on continuous-time Markov decision processes. Top 14, 177–261.Google Scholar

Guo, X., Huang, Y. and Song, X. (2012). Linear programming and constrained average optimality for general continuous-time Markov decision processes in history-dependent policies. SIAM J. Control Optimization 50, 23–47.Google Scholar

Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes. Basic Optimality Criteria. Springer, New York.Google Scholar

Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes. Springer, New York.CrossRef Google Scholar

Jacod, J. (1975). Multivariate point processes: predictable projection, Radon–Nikodým derivatives, representation of martingales. Z. Wahrscheinlichkeitsth. 31, 235–253.Google Scholar

Kakumanu, P. (1971). Continuously discounted Markov decision model with countable state and action space. Ann. Math. Statist. 42, 919–926.CrossRef Google Scholar

Kakumanu, P. (1975). Continuous time Markovian decision processes average return criterion. J. Math. Anal. Appl. 52, 173–188.Google Scholar

Kitaev, M. Y. and Rykov, V. V. (1995). Controlled Queueing Systems. CRC Press, Boca Raton, FL.Google Scholar

Miller, B. L. (1968). Finite state continuous time Markov decision processes with a finite planning horizon. SIAM. J. Control 6, 266–280.CrossRef Google Scholar

Piunovskiy, A. and Zhang, Y. (2011). Accuracy of fluid approximations to controlled birth-and-death processes: absorbing case. Math. Meth. Operat. Res. 73, 159–187.Google Scholar

Piunovskiy, A. and Zhang, Y. (2011). Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J. Control Optimization 49, 2032–2061.Google Scholar

Pliska, S. R. (1975). Controlled jump processes. Stoch. Process. Appl. 3, 259–282.Google Scholar

Prieto-Rumeau, T. and Hernández-Lerma, O. (2012). Discounted continuous-time controlled Markov chains: convergence of control models. J. Appl. Prob. 49, 1072–1090.Google Scholar

Prieto-Rumeau, T. and Hernández-Lerma, O. (2012). Selected Topics on Continuous-Time Controlled Markov Chains and Markov Games. Imperial College Press, London.Google Scholar

Prieto-Rumeau, T. and Lorenzo, J. M. (2010). Approximating ergodic average reward continuous-time controlled Markov chains. IEEE Trans. Automatic Control 55, 201–207.Google Scholar

Ye, L. and Guo, X. (2012). Continuous-time Markov decision processes with state-dependent discount factors. Acta Appl. Math. 121, 5–27.CrossRef Google Scholar

Yushkevich, A. A. (1978). Controlled Markov models with countable state space and continuous time. Theory Prob. Appl. 22, 215–235.CrossRef Google Scholar

Article contents

Finite-horizon optimality for continuous-time Markov decision processes with unbounded transition rates

Abstract

Keywords

MSC classification

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests