Hostname: page-component-78c5997874-lj6df Total loading time: 0 Render date: 2024-11-04T20:34:42.509Z Has data issue: false hasContentIssue false

Risk-sensitive average continuous-time Markov decision processes with unbounded transition and cost rates

Published online by Cambridge University Press:  23 June 2021

Xin Guo*
Affiliation:
Tsinghua University
Yonghui Huang*
Affiliation:
Sun Yat-Sen University
*
*Postal address: School of Economics and Management, Tsinghua University, Beijing, China.
**Postal address: School of Mathematics, Sun Yat-Sen University, and Guangdong Province Key Laboratory of Computational Science, Sun Yat-Sen University, Guangzhou, China. Email address: [email protected]

Abstract

This paper considers risk-sensitive average optimization for denumerable continuous-time Markov decision processes (CTMDPs), in which the transition and cost rates are allowed to be unbounded, and the policies can be randomized history dependent. We first derive the multiplicative dynamic programming principle and some new facts for risk-sensitive finite-horizon CTMDPs. Then, we establish the existence and uniqueness of a solution to the risk-sensitive average optimality equation (RS-AOE) through the results for risk-sensitive finite-horizon CTMDPs developed here, and also prove the existence of an optimal stationary policy via the RS-AOE. Furthermore, for the case of finite actions available at each state, we construct a sequence of models of finite-state CTMDPs with optimal stationary policies which can be obtained by a policy iteration algorithm in a finite number of iterations, and prove that an average optimal policy for the case of infinitely countable states can be approximated by those of the finite-state models. Finally, we illustrate the conditions and the iteration algorithm with an example.

Type
Original Article
Copyright
© The Author(s), 2021. Published by Cambridge University Press on behalf of Applied Probability Trust

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Anderson, W. J. (1991). Continuous-Time Markov Chains. Springer, New York.10.1007/978-1-4612-3038-0CrossRefGoogle Scholar
Bäuerle, N. and Rieder, U. (2014). More risk-sensitive Markov decision processes. Math. Operat. Res. 39, 105120.10.1287/moor.2013.0601CrossRefGoogle Scholar
Cavazos-Cadena, R. and Hernández-Hernández, D. (2011). Discounted approximations for risk-sensitive average criteria in Markov decision chains with finite state space. Math. Operat. Res. 36, 133146.10.1287/moor.1100.0476CrossRefGoogle Scholar
Cavazos-Cadena, R. and Montes-de-Oca, R. (2000). Nearly optimal policies in risk-sensitive positive dynamic programming on discrete spaces. Math. Meth. Operat. Res. 52, 133167.CrossRefGoogle Scholar
Di Masi, G. B. and Stettner, L. (2007). Infinite horizon risk sensitive control of discrete time Markov processes under minorization property. SIAM J. Control Optim. 46, 231252.10.1137/040618631CrossRefGoogle Scholar
Feinberg, E. A., Mandava, M. and Shiryaev, A. N. (2014). On solutions of Kolmogorov’s equations for nonhomogeneous jump Markov processes. J. Math. Anal. Appl. 411, 261270.10.1016/j.jmaa.2013.09.043CrossRefGoogle Scholar
Ghosh, M. K. and Saha, S. (2014). Risk-sensitive control of continuous time Markov chains. Stochastics 86, 655675.10.1080/17442508.2013.872644CrossRefGoogle Scholar
Guo, X. and Zhang, Y. (2020). On risk-sensitive piecewise deterministic Markov decision processes. Appl. Math. Optim. 81, 685710.CrossRefGoogle Scholar
Guo, X., Liu, Q. L. and Zhang, Y. (2019). Finite horizon risk-sensitive continuous-time Markov decision processes with unbounded transition and cost rates. 4OR 17, 427442.10.1007/s10288-019-0398-6CrossRefGoogle Scholar
Guo, X. P. and Hernández-Lerma, O. (2009). Continuous-Time Markov Decision Processes. Springer, Berlin.10.1007/978-3-642-02547-1CrossRefGoogle Scholar
Guo, X. P. and Liao, Z. W. (2019). Risk-sensitive discounted continuous-time Markov decision processes with unbounded rates. SIAM J. Control Optim. 57, 38573883.10.1137/18M1222016CrossRefGoogle Scholar
Guo, X. P. and Piunovskiy, A. (2011). Discounted continuous-time Markov decision processes with constraints: Unbounded transition and loss rates. Math. Operat. Res. 36, 105132.10.1287/moor.1100.0477CrossRefGoogle Scholar
Guo, X. P. and Song, X. Y. (2011). Discounted continuous-time constrained Markov decision processes in Polish spaces. Ann. Appl. Prob. 21, 20162049.10.1214/10-AAP749CrossRefGoogle Scholar
Guo, X. P. and Zhang, J. Y. (2019). Risk-sensitive continuous-time Markov decision processes with unbounded rates and Borel spaces. Discrete Event Dyn. Syst. 29, 445471.10.1007/s10626-019-00292-yCrossRefGoogle Scholar
Guo, X. P., Huang, X. X. and Huang, Y. H. (2015). Finite-horizon optimality for continuous-time Markov decision processes with unbounded transition rates. Adv. Appl. Prob. 47, 10641087.CrossRefGoogle Scholar
Howard, R. and Matheson, J. (1972). Risk-sensitive Markov decision proceses. Manag. Sci. 18, 356369.10.1287/mnsc.18.7.356CrossRefGoogle Scholar
Huang, Y. H., Lian, Z. T. and Guo, X. P. (2018). Risk-sensitive semi-Markov decision processes with general utilities and multiple criteria. Adv. Appl. Prob. 50, 783804.CrossRefGoogle Scholar
Huang, Y. H., Lian, Z. T. and Guo, X. P. (2020). Risk-sensitive finite-horizon piecewise deterministic Markov decision processes. Operat. Res. Lett. 48, 96103.CrossRefGoogle Scholar
JaŚkiewicz, A. (2007). Average optimality for risk-sensitive control with general state space. Ann. Appl. Prob. 17, 654675.CrossRefGoogle Scholar
Karel, S. (2018). Risk-sensitive average optimality in Markov decision processes. Kybernetika (Prague) 54, 12181230.Google Scholar
Kitaev, M. Y. and Rykov, V. V. (1995). Controlled Queueing Systems. CRC Press, New York.Google Scholar
Kumar, K. S. and Pal, C. (2013). Risk-sensitive control of jump process on denumerable state space with near monotone cost. Appl. Math. Optim. 68, 311331.Google Scholar
Kumar, K. S. and Pal, C. (2015). Risk-sensitive ergodic control of continuous-time Markov processes with denumerable state space. Stochastic Anal. Appl. 33, 863881.CrossRefGoogle Scholar
Wei, Q. D. (2016). Continuous-time Markov decision processes with risk-sensitive finite-horizon cost criterion. Math. Meth. Operat. Res. 84, 461487.10.1007/s00186-016-0550-4CrossRefGoogle Scholar
Wei, Q. D. and Chen, X. (2016). Continuous-time Markov decision processes under the risk-sensitive average cost criterion. Operat. Res. Lett. 44, 457462.10.1016/j.orl.2016.04.010CrossRefGoogle Scholar
Wei, Q. D. and Chen, X. (2017). Average cost criterion induced by the regular utility function for continuous-time Markov decision processes. Discrete Event Dyn. Syst. 27, 501524.10.1007/s10626-017-0237-xCrossRefGoogle Scholar
Wei, Q. D. and Chen, X. (2019). Risk-sensitive average continuous-time Markov decision processes with unbounded rates. Optimization 68, 773800.10.1080/02331934.2018.1547382CrossRefGoogle Scholar
Zhang, Y. (2017). Continuous-time Markov decision processes with exponential utility. SIAM J. Control Optim. 55, 26362660.10.1137/16M1086261CrossRefGoogle Scholar