Hostname: page-component-cd9895bd7-jn8rn Total loading time: 0 Render date: 2024-12-27T09:35:17.774Z Has data issue: false hasContentIssue false

A new policy iteration scheme for Markov decision processes using Schweitzer's formula

Published online by Cambridge University Press:  14 July 2016

J. B. Lasserre*
Affiliation:
Laboratoire d'Automatique et d'Analyse des Systèmes du CNRS, Toulouse
*
Postal address: Laboratoire d'Automatique et d'Analyse des Systèmes du CNRS, 7, Avenue du Colonel Roche, 31077 Toulouse Cedex, France.

Abstract

Given a family of Markov chains with a single recurrent class, we present a potential application of Schweitzer's exact formula relating the steady-state probability and fundamental matrices of any two chains in the family. We propose a new policy iteration scheme for Markov decision processes where in contrast to policy iteration, the new criterion for selecting an action ensures the maximal one-step average cost improvement. Its computational complexity and storage requirement are analysed.

Type
Short Communications
Copyright
Copyright © Applied Probability Trust 1994 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] Blackwell, D. (1962) Discrete dynamic programming. Ann. Math. Statist. 33, 719726.Google Scholar
[2] Dantzig, G. B. (1963) Linear Programming and Extensions. Princeton University Press.Google Scholar
[3] Denardo, E. V. and Fox, B. (1968) Multichain Markov renewal programs. SIAM J. Appl. Math. 16, 468487.Google Scholar
[4] Federgruen, A. and Schweitzer, P. J. (1980) A survey of asymptotic value-iteration for undiscounted Markovian decision processes. In Recent Developments in Markov Decision Processes, ed. Hartley, R., Thomas, L. C. and White, D. J. Academic Press, New York.Google Scholar
[5] Hastings, N. A. J. (1971) Bounds on the gain rate of a Markov decision process. Operat. Res. 19, 240244.Google Scholar
[6] Haviv, M. and Van Der Heyden, L. (1984) Perturbation bounds for the stationary probabilities of a finite Markov chain. Adv. Appl. Prob. 16, 804818.Google Scholar
[7] Herzberg, M. and Yechiali, U. (1991) Criteria for selecting the relaxation factor of the value iteration algorithm for undiscounted Markov and semi-Markov decision processes. Operat. Res. Lett. 10, 193202.Google Scholar
[8] Hordijk, A. and Kallenberg, L. C. M. (1979) Linear programming and Markov decision chains. Management Sci. 25, 352362.Google Scholar
[9] Howard, R. (1960) Dynamic Programming and Markov Processes. MIT Press, Cambridge, MA.Google Scholar
[10] Howard, R. A. (1963) Semi-Markov decision processes. Bull. Internat. Inst. Statist. 40(2), 625652.Google Scholar
[11] Jewell, W. S. (1963) Markov-renewal programming I and II. Operat. Res. 11, 938972.Google Scholar
[12] Kemeny, J. G. and Snell, J. L. (1960) Finite Markov Chains. Van Nostrand, Princeton, NJ.Google Scholar
[13] Lasserre, J. B. (1991) Exact formula for sensitivity analysis of Markov chains. J. Optim. Theory Appl. 71, 407413.CrossRefGoogle Scholar
[14] Lasserre, J. B. (1991) Updating formula for Markov chains and applications. LAAS Technical Report.Google Scholar
[15] Manne, A. (1960) Linear programming and sequential decisions. Management Sci 6, 259267.CrossRefGoogle Scholar
[16] Odoni, A. (1969) On finding the maximal gain for MDP's. Operat. Res. 17, 857860.Google Scholar
[17] Osaki, S. and Mine, H. (1968) Linear programming algorithms for semi-Markov decision processes. J. Math. Anal. Appl. 22, 356381.Google Scholar
[18] Popyak, J. L., Brown, R. L. and White, C. C. (1979) Discrete versions of an algorithm due to Varaiya. IEEE Trans. Autom. Control 24, 504.Google Scholar
[19] Puterman, M. (1990) Markov decision processes. In Handbook in Operations Research and Management Science, ed. Heyman, D. P. and Sobel, M. North-Holland, Amsterdam.Google Scholar
[20] Puterman, M. L. and Brumelle, S. L. (1978) The analytic theory of policy iteration. In Dynamic Programming and its Applications, ed. Puterman, M. L. Academic Press, New York.Google Scholar
[21] Schweitzer, P. J. (1968) Perturbation theory and finite Markov chains. J. Appl. Prob. 5, 401413.Google Scholar
[22] Schweitzer, P. J. (1968) Perturbation theory and undiscounted Markov programming. Operat. Res. 17, 716727.Google Scholar
[23] Schweitzer, P. J. (1971) Iterative solution of the functional equations of undiscounted Markov renewal programming. J. Math. Anal. Appl. 34, 495501.Google Scholar
[24] Veinott, A. F. (1966) On finding optimal policies in discrete dynamic programming with no discounting. Ann. Math. Statist. 37, 12841294.Google Scholar
[25] White, D. J. (1963) Dynamic programming, Markov chains and the method of successive approximations. J. Math. Anal. Appl. 6, 373376.Google Scholar