Value iteration in a class of average controlled Markov chains with unbounded costs: necessary and sufficient conditions for pointwise convergence

Rolando Cavazos-Cadena; Emmanuel Fernández-Gaucherand

doi:10.2307/3214980

Value iteration in a class of average controlled Markov chains with unbounded costs: necessary and sufficient conditions for pointwise convergence

Part of: Stochastic systems and control

Published online by Cambridge University Press: 14 July 2016

Rolando Cavazos-Cadena and

Emmanuel Fernández-Gaucherand

Show author details

Rolando Cavazos-Cadena*: Affiliation:
Universidad Autónoma Agraria Antonio Narro
Emmanuel Fernández-Gaucherand*: Affiliation:
The University of Arizona
*: ∗Postal address: Departamento de Estadística y Cálculo, Universidad Autónoma Agraria Antonio Narro, Buenavista, Saltillo COAH 25315, México.
∗∗Postal address: Systems and Industrial Engineering Department, The University of Arizona, Tucson AZ 85721, USA.

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

This work concerns controlled Markov chains with denumerable state space, (possibly) unbounded cost function, and an expected average cost criterion. Under a Lyapunov function condition, together with mild continuity-compactness assumptions, a simple necessary and sufficient criterion is given so that the relative value functions and differential costs produced by the value iteration scheme converge pointwise to the solution of the optimality equation; this criterion is applied to obtain convergence results when the cost function is bounded below or bounded above.

Keywords

CONTROLLED MARKOV CHAINS AVERAGE COST CRITERION LYAPUNOV FUNCTION CONDITION VALUE ITERATION SCHEME POINTWISE CONVERGENCE NECESSARY AND SUFFICIENT CONDITIONS

MSC classification

Primary: 93E20: Optimal stochastic control 90C40: Markov and semi-Markov decision processes

Type: Research Papers
Information: Journal of Applied Probability , Volume 33 , Issue 4 , December 1996 , pp. 986 - 1002

DOI: https://doi.org/10.2307/3214980 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 1996

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

∗

Research supported by a U.S.-México Collaborative Research Program funded by the National Science Foundation under Grant NSF-INT 9201430, and by the Consejo Nacional de Ciencia y Tecnología (CONACyT), México.

∗∗

The work of Cavazos-Cadena was partially supported by the PSF Organization under Grant No. 200–150/94–01, and by the MAXTOR Foundation for Applied Probability and Statistics (MAXFAPS) under Grant No. 01–01–56/01–94.

References

[1] Arapostathis, A., Borkar, V. S., Fernández-Gaucherand, E., Gosh, M. K. and Marcus, S. I. (1993) Discrete-time controlled Markov processes with an average cost criterion: a survey. SIAM J Control Optim. 31, 282–344.CrossRef Google Scholar

[2] Borkar, V. S. (1991) Topics in Controlled Markov Chains (Pitman Research Notes in Mathematics Series 240). Longman, London.Google Scholar

[3] Cavazos-Cadena, R. (1991) Recent results on conditions for the existence of average optimal stationary policies. Ann. Operat. Res. 28, 3–28.CrossRef Google Scholar

[4] Cavazos-Cadena, R. (1994) Cesàro convergence of the undiscounted value iteration method in Markov decision processes under the Lyapunov stability condition. Boletín de la Sociedad Mathemática Mexicana. To appear.Google Scholar

[5] Cavazos-Cadena, R. (1995) Undiscounted value iteration in stable Markov decision processes with bounded rewards. J. Math. Systems, Estimation and Control. To appear.Google Scholar

[6] Cavazos-Cadena, R. and Fernández-Gaucherand, ?. (1995) Denumerable controlled Markov chains with average reward criterion: sample path optimality. ZOR-Math. Meth. Operat. Res. 41, 89–108.CrossRef Google Scholar

[7] Cavazos-Cadena, R. and Hernández-Lerma, O. (1992) Equivalence of Lyapunov stability criteria in a class of Markov decision processes. Appl. Math. Optim. 26, 113–137.CrossRef Google Scholar

[8] Dugundji, J. (1966) Topology. Allyn and Bacon, Boston.Google Scholar

[9] Federgruen, A. and Schweitzer, P. J. (1978) Discounted and undiscounted value iteration in Markov decision problems: A survey. In Dynamic Programming and its Applications. ed. Puterman, M. L. Academic Press, New York. pp. 23–52.CrossRef Google Scholar

[10] Federgruen, A. and Schweitzer, P. J. (1980) A survey of asymptotic value-iteration for undiscounted Markovian decision processes. In Recent Developments in Markov Decision Processes. ed. Hartley, R., Thomas, L. C. and White, D. J. Academic Press, New York. pp. 73–109.Google Scholar

[11] Hernández-Lerma, O. (1989) Adaptive Markov Control Processes. Springer, New York.CrossRef Google Scholar

[12] Hernández-Lerma, O. and Lasserre, J. B. (1993) Value iteration and rolling plans for Markov control processes with unbounded rewards. J. Math. Anal. Appl. 177, 38–55.CrossRef Google Scholar

[13] Hinderer, K. (1970) Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter. Springer, New York.CrossRef Google Scholar

[14] Hordijk, A. (1974) Dynamic Programming and Markov Potential Theory. (Mathematical Centre Tract 51.) Mathematisch Centrum, Amsterdam.Google Scholar

[15] Hordijk, A, Schweitzer, P. J. and Tijms, H. C. (1975) The asymptotic behavior of the minimal total expected cost for the denumerable state Markov decision model. J. Appl. Prob. 12, 298–305.CrossRef Google Scholar

[16] Montes-De-Oca, R. and Hernández-Lerma, O. (1993) Value iteration in average cost Markov control processes on Borel spaces. Submitted. Google Scholar

[17] Puterman, M. (1994) Markov Decision Processes. Wiley, New York.CrossRef Google Scholar

[18] Royden, H. L. (1968) Real Analysis. 2nd edn. Macmillan, New York.Google Scholar

[19] Ross, S. M. (1970) Applied Probability Models with Optimization Applications. Holden-Day, San Francisco.Google Scholar

[20] Schweitzer, P. J. (1971) Iterative solution of the functional equations for undiscounted Markov renewal programming. J. Math. Anal. Appl. 34, 495–501.CrossRef Google Scholar

[21] Sennott, L. I. (1991) Value iteration in countable state average cost Markov decision processes with unbounded costs. Ann. Operat. Res. 28, 261–272.CrossRef Google Scholar

[22] Thomas, L. C. (1965) Connectedness conditions for denumerable state Markov decision processes. In Recent Developments in Markov Decision Processes. ed. Hartley, R., Thomas, L. C. and White, D. J. Academic Press, New York. pp. 181–204.Google Scholar

Article contents

Value iteration in a class of average controlled Markov chains with unbounded costs: necessary and sufficient conditions for pointwise convergence

Abstract

Keywords

MSC classification

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests