Optimal decision procedures for finite markov chains. Part I: Examples

John Bather

doi:10.2307/1426039

Abstract

A Markov process in discrete time with a finite state space is controlled by choosing the transition probabilities from a prescribed set depending on the state occupied at any time. Given the immediate cost for each choice, it is required to minimise the expected cost over an infinite future, without discounting. Various techniques are reviewed for the case when there is a finite set of possible transition matrices and an example is given to illustrate the unpredictable behaviour of policy sequences derived by backward induction. Further examples show that the existing methods may break down when there is an infinite family of transition matrices. A new approach is suggested, based on the idea of classifying the states according to their accessibility from one another.

References

[1] Blackwell, D. (1962) Discrete dynamic programming. Ann. Math. Statist. 33, 719–726.CrossRef Google Scholar

[2] Brown, B. W. (1965) On the iterative method of dynamic programming on a finite space discrete time Markov process. Ann. Math. Statist. 36, 1279–1286.CrossRef Google Scholar

[3] Derman, C. (1970) Finite State Markovian Decision Processes. Academic Press, New York.Google Scholar

[4] Howard, R. A. (1960) Dynamic Programming and Markov Processes. Wiley, New York.Google Scholar

[5] Kemeny, J. G. and Snell, J. L. (1960) Finite Markov Chains. Van Nostrand, New York.Google Scholar

[6] Lanery, E. (1967) Étude asymptotique des systèmes Markoviens à commande. Revue d'Informatique et Recherche Opérationnelle. 1 no. 6, 3–56.Google Scholar

[7] Miller, B. L. and Veinott, A. F. Jr. (1969) Discrete dynamic programming with a small interest rate. Ann. Math. Statist. 40, 366–370.CrossRef Google Scholar

[8] Veinott, A. F. Jr. (1966) On finding optimal policies in discrete dynamic programming with no discounting. Ann. Math. Statist. 37, 1284–1294.CrossRef Google Scholar

[9] Veinott, A. F. Jr. (1969) Discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist. 40, 1635–1660.CrossRef Google Scholar

Crossref Citations

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Bather, John 1973. Optimal decision procedures for finite Markov chains. Part III: General convex systems. Advances in Applied Probability, Vol. 5, Issue. 3, p. 541.

Bather, John 1973. Optimal decision procedures for finite Markov chains. Part II: Communicating systems. Advances in Applied Probability, Vol. 5, Issue. 3, p. 521.

Bather, John 1976. Optimal stationary policies for denumerable Markov chains in continuous time. Advances in Applied Probability, Vol. 8, Issue. 1, p. 144.

1976. Dynamic Programming and Stochastic Control. Vol. 125, Issue. , p. 387.

Fainberg, E. A. 1976. On Controlled Finite State Markov Processes with Compact Control Sets. Theory of Probability & Its Applications, Vol. 20, Issue. 4, p. 856.

Robinson, D. R. 1976. Markov decision chains with unbounded costs and applications to the control of queues. Advances in Applied Probability, Vol. 8, Issue. 1, p. 159.

Hinderer, Karl and Hübner, Gerhard 1977. Transactions of the Seventh Prague Conference on Information Theory, Statistical Decision Functions, Random Processes and of the 1974 European Meeting of Statisticians. p. 245.

Federgruen, A. and Schweitzer, P.J. 1978. Dynamic Programming and its Applications. p. 23.

Fainberg, E. A. 1979. The Existence of a Stationary $\varepsilon $-Optimal Policy for a Finite Markov Chain. Theory of Probability & Its Applications, Vol. 23, Issue. 2, p. 297.

Schweitzer, P. J. and Federgruen, A. 1979. Geometric convergence of value-iteration in multichain Markov decision problems. Advances in Applied Probability, Vol. 11, Issue. 1, p. 188.

Federgruen, A. Hordijk, A. and Tijms, H.C. 1979. Denumerable state semi-Markov decision processes with unbounded costs, average cost criterion. Stochastic Processes and their Applications, Vol. 9, Issue. 2, p. 223.

Fainberg, E. A. 1980. An $\varepsilon $-Optimal Control of a Finite Markov Chain with an Average Reward Criterion. Theory of Probability & Its Applications, Vol. 25, Issue. 1, p. 70.

Schweitzer, P. J. 1984. A value-iteration scheme for undiscounted multichain Markov renewal programs. Zeitschrift für Operations Research, Vol. 28, Issue. 5, p. 143.

Demko, S. and Hill, T.P. 1984. On maximizing the average time at a goal. Stochastic Processes and their Applications, Vol. 17, Issue. 2, p. 349.

Bierth, K.-J. 1987. An expected average reward criterion. Stochastic Processes and their Applications, Vol. 26, Issue. , p. 123.

Bierth, Karl-Josef 1987. DGOR. p. 643.

Puterman, Martin L. 1990. Stochastic Models. Vol. 2, Issue. , p. 331.

Dekker, Rommert and Hordijk, Arie 1991. Denumerable semi-Markov decision chains with small interest rates. Annals of Operations Research, Vol. 28, Issue. 1, p. 185.

1994. Markov Decision Processes. p. 613.

Download full list

Article contents

Optimal decision procedures for finite markov chains. Part I: Examples

Abstract

Keywords

Access options

References

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Article contents

Optimal decision procedures for finite markov chains. Part I: Examples

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests