Hostname: page-component-78c5997874-ndw9j Total loading time: 0 Render date: 2024-11-14T17:22:52.564Z Has data issue: false hasContentIssue false

An Inequality for Variances of the Discounted Rewards

Published online by Cambridge University Press:  14 July 2016

Eugene A. Feinberg*
Affiliation:
Stony Brook University
Jun Fei*
Affiliation:
Stony Brook University
*
Postal address: Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794, USA.
Postal address: Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794, USA.
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We consider the following two definitions of discounting: (i) multiplicative coefficient in front of the rewards, and (ii) probability that the process has not been stopped if the stopping time has an exponential distribution independent of the process. It is well known that the expected total discounted rewards corresponding to these definitions are the same. In this note we show that, the variance of the total discounted rewards is smaller for the first definition than for the second definition.

Type
Research Article
Copyright
Copyright © Applied Probability Trust 2009 

References

[1] Baykal-Gürsoy, M. and Gürsoy, K. (2007). Semi-Markov decision processes: nonstandard criteria. Prob. Eng. Inf. Sci. 21, 635657.Google Scholar
[2] Feinberg, E. A. (2004). Continuous time discounted Jump Markov decision processes: a discrete-event approach. Math. Operat. Res. 29, 492524.Google Scholar
[3] Fristedt, B. and Gray, L. (1997). A Modern Approach to Probability Theory. Birkhäuser, Boston, MA.Google Scholar
[4] Jaquette, S. C. (1975). Markov decision processes with a new optimality criterion: continuous time. Ann. Statist. 3, 547553.CrossRefGoogle Scholar
[5] Markowitz, H. M. (1952). Portfolio selection. J. Finance 7, 7791.Google Scholar
[6] Shiryaev, A. N. (1996). Probability, 2nd edn. Springer, New York.Google Scholar
[7] Sobel, M. J. (1982). The variance of discounted Markov decision processes. J. Appl. Prob. 19, 794802.Google Scholar
[8] Sobel, M. J. (1985). Maximal mean/standard deviation ratio in an undiscounted MDP. Operat. Res. Lett. 4, 157159.Google Scholar
[9] Sobel, M. J. (1994). Mean-variance tradeoffs in an undiscounted MDP. Operat. Res. 42, 175183.CrossRefGoogle Scholar
[10] Van Dijk, N. M. and Sladký, K. (2006). On the total reward variance for continuous-time Markov reward chains. J. Appl. Prob. 43, 10441052.Google Scholar
[11] White, D. J. (1988). Mean, variance, and probabilistic criteria in finite Markov decision processes: a review. J. Optimization Theory Appl. 56, 129.Google Scholar