Hostname: page-component-586b7cd67f-t7fkt Total loading time: 0 Render date: 2024-11-24T09:03:46.979Z Has data issue: false hasContentIssue false

Optimality conditions for a Markov decision chain with unbounded costs

Published online by Cambridge University Press:  14 July 2016

D. R. Robinson*
Affiliation:
University of Manchester
*
Postal address: Statistical Laboratory, Department of Mathematics, The University, Manchester M13 9PL, U.K.

Abstract

It is known that when costs are unbounded satisfaction of the appropriate dynamic programming ‘optimality' equation by a policy is not sufficient to guarantee its average optimality. A ‘lowest-order potential' condition is introduced which, along with the dynamic programming equation, is sufficient to establish the optimality of the policy. Also, it is shown that under fairly general conditions, if the lowest-order potential condition is not satisfied there exists a non-memoryless policy with smaller average cost than the policy satisfying the dynamic programming equation.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] Bather, J. A. (1976) Optimal stationary policies for denumerable Markov chains in continuous time. Adv. Appl. Prob. 8, 144158.Google Scholar
[2] Chung, K. L. (1960) Markov Chains with Stationary Transition Probabilities, 2nd edn. Springer-Verlag, Berlin.CrossRefGoogle Scholar
[3] Derman, C. and Strauch, R. E. (1966) A note on memoryless rules for controlling sequential control processes. Ann. Math. Statist. 37, 276278.Google Scholar
[4] Derman, C. and Veinott, A. F. (1967) A solution to a countable system of equations arising in Markovian decision processes. Ann. Math. Statist. 38, 582584.Google Scholar
[5] Holewijn, P. J. and Hordijk, A. (1975) On the convergence of moments in stationary Markov chains. Stoch. Proc. Appl. 3, 5564.Google Scholar
[6] Hordijk, A. (1974) Dynamic Programming and Markov Potential Theory. Mathematical Centre Tracts No. 51, Amsterdam.Google Scholar
[7] Howard, R. A. (1960) Dynamic Programming and Markov Processes. M.I.T. Press, Cambridge, Mass.Google Scholar
[8] Kushner, H. (1971) Introduction to Stochastic Control. Holt, Rinehart and Winston, New York.Google Scholar
[9] Robinson, D. R. (1976) Markov decision chains with unbounded costs and applications to the control of queues. Adv. Appl. Prob. 8, 159176.CrossRefGoogle Scholar
[10] Robinson, D. R. (1979) Optimality conditions for a denumerable state Markov decision chain with unbounded costs. Research Report No. 74/DRR/3, Manchester-Sheffield School of Probability and Statistics.Google Scholar
[11] Ross, S. M. (1970) Average cost semi-Markov decision processes. J. Appl. Prob. 7, 649656.Google Scholar