Published online by Cambridge University Press: 14 July 2016
A semi-Markov decision process, with a denumerable multidimensional state space, is considered. At any given state only a finite number of actions can be taken to control the process. The immediate reward earned in one transition period is merely assumed to be bounded by a polynomial and a bound is imposed on a weighted moment of the next state reached in one transition. It is shown that under an ergodicity assumption there is a stationary optimal policy for the long-run average reward criterion. A queueing network scheduling problem, for which previous criteria are inapplicable, is given as an application.
This study was initiated in the course of the author's visit to the Department of Business Administration of the University of Illinois, and concluded with the support of the National Science Foundation under Grant ENG-7903879A01 in the course of his visit to Department EECS of the University of California at Berkeley.