Hostname: page-component-586b7cd67f-t7czq Total loading time: 0 Render date: 2024-11-23T22:11:35.342Z Has data issue: false hasContentIssue false

The variance of discounted Markov decision processes

Published online by Cambridge University Press:  14 July 2016

Matthew J. Sobel*
Affiliation:
Georgia Institute of Technology
*
Postal address: College of Management, Georgia Institute of Technology, Atlanta, GA 30332, U.S.A.

Abstract

Formulae are presented for the variance and higher moments of the present value of single-stage rewards in a finite Markov decision process. Similar formulae are exhibited for a semi-Markov decision process. There is a short discussion of the obstacles to using the variance formula in algorithms to maximize the mean minus a multiple of the standard deviation.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 1982 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Denardo, E. V. (1967) Contraction mappings in the theory underlying dynamic programming. SIAM Rev. 9, 165177.Google Scholar
Denardo, E. V. (1971) Markov renewal programming with small interest rates. Ann. Math. Statist. 42, 477496.CrossRefGoogle Scholar
Derman, C. (1970) Finite State Markovian Decision Processes. Academic Press, New York.Google Scholar
Ferejohn, J. and Page, T. (1978) On the foundations of intertemporal choice. Amer. J. Agricultural Econom. 60, 269275.Google Scholar
Jaquette, S. C. (1973) Markov decision processes with a new optimality criterion: discrete time. Ann. Statist. 1, 496505.CrossRefGoogle Scholar
Kemeny, J. G. and Snell, J. L. (1960) Finite Markov Chains. Van Nostrand, New York.Google Scholar
Kreps, D. M. and Porteus, E. L. (1978) Temporal resolution of uncertainty and dynamic choice theory. Econometrica 46, 185200.Google Scholar
Kushner, H. (1971) Introduction to Stochastic Control. Holt, New York.Google Scholar
Mandl, P. (1971) On the variance of controlled Markov chains. Kybernetika 7, 112.Google Scholar
Mendelssohn, R. (1980) A systematic approach to determining mean-variance tradeoffs when managing randomly varying populations. Math. Biosci. 50, 7584.CrossRefGoogle Scholar
Mine, H. and Osaki, S. (1970) Markovian Decision Processes. American Elsevier, New York.Google Scholar
Platzman, L. K. (1978) Mimeographed lecture notes for IOE 315. Dept. of Industrial and Operations Engineering, University of Michigan, Ann Arbor.Google Scholar
Sobel, M. J. (1975) Ordinal dynamic programming. Management Sci. 21, 967975.Google Scholar
Stancu-Minasian, I. M. and Wets, M. J. (1976) A research bibliography in stochastic programming. Operat. Res. 24, 10781119.CrossRefGoogle Scholar
White, D. J. (1974) Dynamic programming and probabilistic constraints. Operat. Res. 22, 654664.Google Scholar