Published online by Cambridge University Press: 14 July 2016
We consider the adaptive control of Markov chains under the weak accessibility condition with a view to minimizing the learning loss. A certainty equivalence control with a forcing scheme is constructed. We use a stationary randomized control scheme for forcing and compute a maximum likelihood estimate of the unknown parameter from the resulting observations. We obtain an exponential upper bound on the rate of decay of the probability of error. This allows us to choose the rate of forcing appropriately, whereby we achieve a o(f(n) log n) learning loss for any function as .
Research supported by NSF Grant No. ECS-8919818.