Hostname: page-component-586b7cd67f-2brh9 Total loading time: 0 Render date: 2024-11-28T04:09:15.793Z Has data issue: false hasContentIssue false

The Expected Total Cost Criterion for Markov Decision Processes under Constraints: A Convex Analytic Approach

Published online by Cambridge University Press:  04 January 2016

François Dufour*
Affiliation:
Université Bordeaux, IMB and INRIA Bordeaux Sud-ouest
M. Horiguchi*
Affiliation:
Kanagawa University
A. B. Piunovskiy*
Affiliation:
University of Liverpool
*
Postal address: INRIA Bordeaux Sud-ouest, CQFD Team, 351 cours de la Libération, F-33400 Talence, France. Email address: [email protected]
∗∗ Postal address: Department of Mathematics, Faculty of Engineering, Kanagawa University, 3-27-1 Rokkakubashi, Kanagawa-ku, Yokohama 221-8686, Japan. Email address: [email protected]
∗∗∗ Postal address: Department of Mathematical Sciences, University of Liverpool, Liverpool L69 7ZL, UK. Email address: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

This paper deals with discrete-time Markov decision processes (MDPs) under constraints where all the objectives have the same form of expected total cost over the infinite time horizon. The existence of an optimal control policy is discussed by using the convex analytic approach. We work under the assumptions that the state and action spaces are general Borel spaces, and that the model is nonnegative, semicontinuous, and there exists an admissible solution with finite cost for the associated linear program. It is worth noting that, in contrast to the classical results in the literature, our hypotheses do not require the MDP to be transient or absorbing. Our first result ensures the existence of an optimal solution to the linear program given by an occupation measure of the process generated by a randomized stationary policy. Moreover, it is shown that this randomized stationary policy provides an optimal solution to this Markov control problem. As a consequence, these results imply that the set of randomized stationary policies is a sufficient set for this optimal control problem. Finally, our last main result states that all optimal solutions of the linear program coincide on a special set with an optimal occupation measure generated by a randomized stationary policy. Several examples are presented to illustrate some theoretical issues and the possible applications of the results developed in the paper.

Type
General Applied Probability
Copyright
© Applied Probability Trust 

References

Altman, E. (1999). Constrained Markov Decision Processes. Chapman & Hall/CRC, Boca Raton, FL.Google Scholar
Bäuerle, N. and Rieder, U. (2011). Markov Decision Processes with Applications to Finance. Springer, Heidelberg.Google Scholar
Bertsekas, D. P. (1987). Dynamic Programming. Prentice Hall, Englewood Cliffs, NJ.Google Scholar
Bertsekas, D. P. and Shreve, S. E. (1978). Stochastic Optimal Control (Math. Sci. Eng. 139). Academic Press, New York.Google Scholar
Borkar, V. S. (2002). Convex analytic methods in Markov decision processes. In Handbook of Markov Decision Processes (Internat. Ser. Operat. Res. Manag. 40), Kluwer, Boston, MA, pp. 347375.Google Scholar
Dufour, F. and Piunovskiy, A. B. (2010). Multiobjective stopping problem for discrete-time Markov processes: convex analytic approach. J. Appl. Prob. 47, 947966.Google Scholar
Feinberg, E. A. (2002). Total reward criteria. In Handbook of Markov Decision Processes (Internat. Ser. Operat. Res. Manag. 40), Kluwer, Boston, MA, pp. 173207.Google Scholar
Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes (Appl. Math. 30). Springer, New York.Google Scholar
Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes (Appl. Math. 42). Springer, New York.CrossRefGoogle Scholar
Horiguchi, M. (2001). Markov decision processes with a stopping time constraint. Math. Meth. Operat. Res. 53, 279295.Google Scholar
Horiguchi, M. (2001). Stopped Markov decision processes with multiple constraints. Math. Meth. Operat. Res. 54, 455469.Google Scholar
Luenberger, D. G. and Ye, Y. (2010). Linear and Nonlinear Programming (Internat. Ser. Operat. Res. Manag. Sci. 116), 3rd edn. Springer, New York.Google Scholar
Piunovskiy, A. B. (1997). Optimal Control of Random Sequences in Problems with Constraints (Math. Appl. 410). Kluwer, Dordrecht.Google Scholar
Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.Google Scholar
Rockafellar, R. T. (1970). Convex Analysis (Princeton Math. Ser. 28). Princeton University Press.Google Scholar
Schäl, M. (1975). On dynamic programming: compactness of the space of policies. Stoch. Process. Appl. 3, 345364.Google Scholar