Published online by Cambridge University Press: 27 July 2009
Finite-state-and-action Markov branching decision chains are studied with bounded endogenous expected population sizes and interest-rate-dependent one-period rewards that are analytic in the interest rate at zero. The existence of a stationary strong-maximum-present-value policy is established. Miller and Veinott's [1969] strong policy-improvement method is generalized to find in finite time a stationary n-present-value optimal policy and, when the one-period rewards are rational in the interest rate, a stationary strong-maximum-present-value policy. This extends previous studies of Blackwell [1962], Miller and Veinott [1969], Veinott [1974], and Rothblum [1974, 1975], in which the one-period rewards are independent of the interest rate, and Denardo [1971] in which semi-Markov decision chains with small interest rates are studied. The problem of finding a stationary n-present-value optimal policy is also formulated as a staircase linear program in which the objective function and right-hand sides, but not the constraint matrix, depend on the interest rate, and solutions for all small enough positive interest rates are sought. The optimal solutions of the primal and dual are polynomials in the reciprocal of the interest rate. A constructive rule is given for finding a stationary n-present-value optimal policy from an optimal solution of the asymptotic linear program. This generalizes the linear programming approaches for finding maximum-reward-rate and maximum-present-value policies for Markov decision chains studied by Manne [1960], d'Epenoux [1960, 1963], Balinski [1961], Derman [1962], Denardo and Fox [1968], Denardo [1970], Derman and Veinott [1972], Veinott [1973], and Hordijk and Kallenberg [1979, 1984].