Optimality conditions for a Markov decision chain with unbounded costs

D. R. Robinson

doi:10.2307/3213209

Optimality conditions for a Markov decision chain with unbounded costs

Published online by Cambridge University Press: 14 July 2016

D. R. Robinson

Show author details

D. R. Robinson*: Affiliation:
University of Manchester
*: ∗Postal address: Statistical Laboratory, Department of Mathematics, The University, Manchester M13 9PL, U.K.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

It is known that when costs are unbounded satisfaction of the appropriate dynamic programming ‘optimality' equation by a policy is not sufficient to guarantee its average optimality. A ‘lowest-order potential' condition is introduced which, along with the dynamic programming equation, is sufficient to establish the optimality of the policy. Also, it is shown that under fairly general conditions, if the lowest-order potential condition is not satisfied there exists a non-memoryless policy with smaller average cost than the policy satisfying the dynamic programming equation.

Keywords

MARKOV DECISION CHAIN UNBOUNDED ONE-STEP COSTS OPTIMALITY EQUATION DYNAMIC PROGRAMMING

Type: Research Papers
Information: Journal of Applied Probability , Volume 17 , Issue 4 , December 1980 , pp. 996 - 1003

DOI: https://doi.org/10.2307/3213209 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] Bather, J. A. (1976) Optimal stationary policies for denumerable Markov chains in continuous time. Adv. Appl. Prob. 8, 144–158.Google Scholar

[2] Chung, K. L. (1960) Markov Chains with Stationary Transition Probabilities, 2nd edn. Springer-Verlag, Berlin.CrossRef Google Scholar

[3] Derman, C. and Strauch, R. E. (1966) A note on memoryless rules for controlling sequential control processes. Ann. Math. Statist. 37, 276–278.Google Scholar

[4] Derman, C. and Veinott, A. F. (1967) A solution to a countable system of equations arising in Markovian decision processes. Ann. Math. Statist. 38, 582–584.Google Scholar

[5] Holewijn, P. J. and Hordijk, A. (1975) On the convergence of moments in stationary Markov chains. Stoch. Proc. Appl. 3, 55–64.Google Scholar

[6] Hordijk, A. (1974) Dynamic Programming and Markov Potential Theory. Mathematical Centre Tracts No. 51, Amsterdam.Google Scholar

[7] Howard, R. A. (1960) Dynamic Programming and Markov Processes. M.I.T. Press, Cambridge, Mass.Google Scholar

[8] Kushner, H. (1971) Introduction to Stochastic Control. Holt, Rinehart and Winston, New York.Google Scholar

[9] Robinson, D. R. (1976) Markov decision chains with unbounded costs and applications to the control of queues. Adv. Appl. Prob. 8, 159–176.CrossRef Google Scholar

[10] Robinson, D. R. (1979) Optimality conditions for a denumerable state Markov decision chain with unbounded costs. Research Report No. 74/DRR/3, Manchester-Sheffield School of Probability and Statistics.Google Scholar

[11] Ross, S. M. (1970) Average cost semi-Markov decision processes. J. Appl. Prob. 7, 649–656.Google Scholar

Article contents

Optimality conditions for a Markov decision chain with unbounded costs

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests