Hostname: page-component-cd9895bd7-lnqnp Total loading time: 0 Render date: 2024-12-25T07:51:29.312Z Has data issue: false hasContentIssue false

Optimal decision procedures for finite Markov chains. Part II: Communicating systems

Published online by Cambridge University Press:  01 July 2016

John Bather*
Affiliation:
University of Sussex

Abstract

A Markov process in discrete time with a finite state space is controlled by choosing the transition probabilities from a given convex family of distributions depending on the present state. The immediate cost is prescribed for each choice and it is required to minimise the average expected cost over an infinite future. The paper considers a special case of this general problem and provides the foundation for a general solution. The main result is that an optimal policy exists if each state of the system can be reached with positive probability from any other state by choosing a suitable policy.

Type
Research Article
Copyright
Copyright © Applied Probability Trust 1973 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] Bather, J. (1973) Optimal decision procedures for finite Markov chains. Part I: examples. Adv. Appl. Prob. 5, 328339.Google Scholar
[2] Brown, B. W. (1965) On the iterative method of dynamic programming on a finite space discrete time Markov process. Ann. Math. Statist. 36, 12791286.Google Scholar
[3] Derman, C. and Veinott, A. F. Jr. (1967) A solution to a countable system of equations arising in Markovian decision processes. Ann. Math. Statist. 38, 582584.Google Scholar
[4] Hordijk, A. (1971) A sufficient condition for the existence of an optimal policy with respect to the average cost criterion in Markovian decision processes. Report BW 14/71, Mathematisch Centrum, Amsterdam. (To appear in Transactions of the Sixth Prague Conference on Information Theory, Statistical Decision Functions and Random Processes.) .Google Scholar
[5] Hordijk, A. (1972) Over een Doeblinvoorwaarde en haar toepassing in beslissingsprocessen. Report BW 15/72, Mathematisch Centrum, Amsterdam.Google Scholar
[6] Howard, R. A. (1960) Dynamic Programming and Markov Processes. Wiley, New York.Google Scholar
[7] Kemeny, J. G. and Snell, J. L. (1966) Finite Markov Chains. Van Nostrand, New York.Google Scholar
[8] Lanery, E. (1967) Étude asymptotique des systèmes Markoviens à commande. Revue d'Informatique et Recherche Operationnelle. 1, no. 6, 356.Google Scholar
[9] Ross, S. (1968) Non-discounted denumerable Markovian decision models. Ann. Math. Statist. 39, 412423.Google Scholar
[10] Veinott, A. F. Jr. (1966) On finding optimal policies in discrete dynamic programming with no discounting. Ann. Math. Statist. 37, 12841294.CrossRefGoogle Scholar