Learning algorithms for Markov decision processes

Masami Kurano

doi:10.2307/3214080

Learning algorithms for Markov decision processes

Published online by Cambridge University Press: 14 July 2016

Masami Kurano

Show author details

Masami Kurano*: Affiliation:
Chiba University
*: ∗Postal address: Department of Mathematics, Faculty of Education, Chiba University, Yayoi-cho, Chiba 260, Japan.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This study is concerned with finite Markov decision processes whose dynamics and reward structure are unknown but the state is observable exactly.

We establish a learning algorithm which yields an optimal policy and construct an adaptive policy which is optimal under the average expected reward criterion.

Keywords

ADAPTIVE CONTROL AVERAGE REWARD CRITERION

Type: Short Communications
Information: Journal of Applied Probability , Volume 24 , Issue 1 , March 1987 , pp. 270 - 276

DOI: https://doi.org/10.2307/3214080 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 1987

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Billingsley, P. (1961) Statistical Inference for Markov Processes. University of Chicago Press, Chicago.Google Scholar

Federgruen, A. and Schweitzer, P. T. (1981) Non-stationary Markov decision problems with converging parameters. J. Optim. Theory Applic. 34, 207–241.10.1007/BF00935474Google Scholar

Hernández-Lerma, O. and Marcus, S. I. (1985) Adaptive control of discounted Markov decision chains. J. Optim. Theory Applic. 46, 227–235.10.1007/BF00938426Google Scholar

Kurano, M. (1972) Discrete-time Markovian decision processes with an unknown parameter-average return criterion. J. Operat. Res. Soc. Japan 15, 67–76.Google Scholar

Kurano, M. (1983) Adaptive polices in Markov decision processes with uncertain matrices. J. Inf. Optim. Sci. 4, 21–40.Google Scholar

Lakshmivarahan, S. (1981) Learning Algorithms, Theory and Applications. Springer-Verlag, New York.10.1007/978-1-4612-5975-6Google Scholar

Loeve, M. (1963) Probability Theory. Van Nostrand, New York.Google Scholar

Mandl, P. (1974) Estimation and control in Markov chains. Adv. Appl. Prob. 6, 40–60.Google Scholar

Meybodi, M. R. and Lakshmivarahan, S. (1982) e -optimality of a general class of learning algorithms. Inf. Sci. 20, 1–20.10.1016/0020-0255(82)90029-9Google Scholar

Ross, S. M. (1970) Applied Probability Models with Optimization Applications. Holden-Day, San Francisco.Google Scholar

Van Hee, K. M. (1978) Bayesian Control of Markov Chains. Mathematical Center Tracts 95, Mathematish Centrum, Amsterdam.Google Scholar

Article contents

Learning algorithms for Markov decision processes

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests