Hostname: page-component-586b7cd67f-r5fsc Total loading time: 0 Render date: 2024-11-28T00:57:21.113Z Has data issue: false hasContentIssue false

Uniformization for semi-Markov decision processes under stationary policies

Published online by Cambridge University Press:  14 July 2016

Frederick J. Beutler*
Affiliation:
University of Michigan, Ann Arbor
Keith W. Ross*
Affiliation:
University of Pennsylvania
*
Postal address: Department of Electrical Engineering and Computer Science, EECS, University of Michigan, Ann Arbor, MI 48109, USA.
∗∗Postal address: Department of Systems Engineering, University of Pennsylvania, Philadelphia, PA 19104, USA.

Abstract

Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same average rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies.

We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally. The structure of optimal constrained SMDP policies can then be elucidated by studying the corresponding controlled Markov chains. Moreover, constrained SMDP optimal policy computations can be more easily implemented in discrete time, the generalized uniformization being employed to relate discrete- and continuous-time optimal constrained policies.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 1987 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] Beutler, F. J. and Ross, K. W. (1985) Optimal policies for controlled Markov chains with a constraint. J. Math. Anal. Appl. 112, 236252.CrossRefGoogle Scholar
[2] Beutler, F. J. and Ross, K. W. (1986) Time-average optimal constrained semi-Markov decision processes. Adv. Appl. Prob. 18, 341359.Google Scholar
[3] Borkar, V. (1983) Controlled Markov chains and stochastic networks. SIAM J. Control Optim. 21, 652666.Google Scholar
[4] Çinlar, E. (1975) Introduction to Stochastic Processes. Prentice-Hall, Englewood Cliffs, NJ.Google Scholar
[5] Hajek, B. (1984) Optimal control of two interacting service stations. IEEE Trans. Autom. Control 29, 491499.Google Scholar
[6] Kleinrock, L. (1975) Queueing Systems Volume I: Theory. Wiley, New York.Google Scholar
[7] Lippman, S. A. (1975) Applying a new device in the optimization of exponential queueing systems. Operat. Res. 23, 687710.Google Scholar
[8] Rosberg, Z., Varaiya, P. and Walrand, J. (1982) Optimal control of service in tandem queues. IEEE Trans. Autom. Control 27, 600610.Google Scholar
[9] Ross, K. Constrained Markov Decision Processes with Queueing Applications. Dissertation, Computer, Information and Control Engineering Program, University of Michigan.Google Scholar
[10] Ross, S. (1971) Applied Probability Models with Optimization Applications. Holden-Day, San Francisco.Google Scholar
[11] Serfozo, R. F. (1979) An equivalence between continuous and discrete time Markov decision processes. Operat. Res. 27, 616620.CrossRefGoogle Scholar
[12] Stidham, S. Jr. (1982) Optimal control of arrivals to queues and networks of queues. Proc. 21st IEEE Conf. Decision and Control, Orlando, Florida.Google Scholar