Hostname: page-component-cd9895bd7-jn8rn Total loading time: 0 Render date: 2024-12-28T01:07:38.267Z Has data issue: false hasContentIssue false

The Computation of Average Optimal Policies in Denumerable State Markov Decision Chains

Published online by Cambridge University Press:  01 July 2016

Linn I. Sennott*
Affiliation:
Illinois State University
*
Postal address: Department of Mathematics 4520, Illinois State University, Normal, IL 61790-4520, USA. [email protected]

Abstract

This paper studies the expected average cost control problem for discrete-time Markov decision processes with denumerably infinite state spaces. A sequence of finite state space truncations is defined such that the average costs and average optimal policies in the sequence converge to the optimal average cost and an optimal policy in the original process. The theory is illustrated with several examples from the control of discrete-time queueing systems. Numerical results are discussed.

Type
General Applied Probability
Copyright
Copyright © Applied Probability Trust 1997 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This material is based upon work supported by the National Science Foundation under Grant No. ECS-9309154.

References

Apostol, T. (1974) Mathematical Analysis. Addison-Wesley, Reading, MA.Google Scholar
Arapostathis, A., Borkar, V. S., FernÁNdez-Gaucherand, E., Ghosh, M. K. and Marcus, S. I. (1993) Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optim. 31, 282344.Google Scholar
Bertsekas, D. (1987) Dynamic Programming. Prentice-Hall, Englewood Cliffs, NJ.Google Scholar
Borkar, V. S. (1984) On minimum cost per unit time control of Markov chains. SIAM J. Control. Optim. 22, 965978.Google Scholar
Borkar, V. S. (1989) Control of Markov chains with long-run average cost criterion: the dynamic programming equations. SIAM J. Control Optim. 27, 642657.Google Scholar
Cavazos-Cadena, R. (1986) Finite-state approximations for denumerable state discounted Markov decision processes. Appl. Math. Optim. 14, 126.Google Scholar
Cavazos-Cadena, R. and Sennott, L. I. (1992) Comparing recent assumptions for the existence of average optimal stationary policies. Operat. Res. Lett. 11, 3337.Google Scholar
Chung, K. L. (1967) Markov Chains with Stationary Transition Probabilities. 2nd edn. Springer, New York.Google Scholar
Fox, B. L. (1971) Finite-state approximations to denumerable state dynamic programs. J. Math. Anal. Appl. 34, 665670.Google Scholar
Gibson, D. and Seneta, E. (1986) Augmented truncations of infinite stochastic matrices. Report. Department of Mathematical Statistics, University of Sydney.Google ScholarPubMed
Gibson, D. and Seneta, E. (1987) Augmented truncations of infinite stochastic matrices. J. Appl. Prob. 24, 600608.Google Scholar
Golub, G. H. and Seneta, E. (1974) Computation of the stationary distribution of an infinite stochastic matrix of special form. Bull. Austral. Math. Soc. 10, 255261.Google Scholar
Grassman, W. K., Taksar, M. I. and Heyman, D. P. (1985) Regenerative analysis and steady state distributions for Markov chains. Operat. Res. 33, 11071116.Google Scholar
Hernandez-Lerma, O. (1989) Adaptive Markov Control Processes (Applied Mathematical Sciences 79). Springer, New York.CrossRefGoogle Scholar
Heyman, D. P. (1991) Approximating the stationary distribution of an infinite stochastic matrix. J. Appl. Prob. 28, 96103.Google Scholar
Puterman, M. L. (1994) Markov Decision Processes. Wiley, New York.Google Scholar
Ross, S. M. (1983) Introduction to Stochastic Dynamic Programming. Academic Press, New York.Google Scholar
Royden, H. L. (1968) Real Analysis. 2nd edn. Macmillan, New York.Google Scholar
Sennott, L. I. (1989) Average cost optimal stationary policies in infinite state Markov decision processes with unbounded costs. Operat. Res. 37, 626633.Google Scholar
Sennott, L. I. (1993) The average cost optimality equation and critical number policies. Prob. Eng. inf. Sci. 7, 4767.Google Scholar
Sennott, L. I. (1995) Another set of conditions for average optimality in Markov control processes. Sys. Control. Lett. 24, 147151.Google Scholar
Thomas, L. C. and Stengos, D. (1985) Finite state approximation algorithms for average cost denumerable state Markov decision processes. OR Spek. 7, 2737.Google Scholar
Tidball, M. M. and Altman, E. (1996) Approximations in dynamic zero-sum games. I. SIAM J. Control Optim. 34, 311328.Google Scholar
White, D. J. (1980) Finite state approximations for denumerable state infinite horizon discounted Markov decision processes: the method of successive approximations. In Recent Developments in Markov Decision Processes. ed. Hartley, R., Thomas, L. C. and White, D. J.. Academic Press, New York. pp. 5772.Google Scholar
Wolf, D. (1980) Approximation of the invariant probability measure of an infinite stochastic matrix. Adv. Appl. Prob. 12, 710726.CrossRefGoogle Scholar