Published online by Cambridge University Press: 11 December 2014
The depth of a trie has been deeply studied when the source which produces the words is a simple source (a memoryless source or a Markov chain). When a source is simple but not an unbiased memoryless source, the expectation and the variance are both of logarithmic order and their dominant terms involve characteristic objects of the source, for instance the entropy. Moreover, there is an asymptotic Gaussian law, even though the speed of convergence towards the Gaussian law has not yet been precisely estimated. The present paper describes a ‘natural’ class of general sources, which does not contain any simple source, where the depth of a random trie, built on a set of words independently drawn from the source, has the same type of probabilistic behaviour as for simple sources: the expectation and the variance are both of logarithmic order and there is an asymptotic Gaussian law. There are precise asymptotic expansions for the expectation and the variance, and the speed of convergence toward the Gaussian law is optimal. The paper first provides analytical conditions on the Dirichlet series of probabilities of a general source under which this Gaussian law can be derived: a pole-free region where the series is of polynomial growth. In a second step, the paper focuses on sources associated with dynamical systems, called dynamical sources, where the Dirichlet series of probabilities is expressed with the transfer operator of the dynamical system. Then, the paper extends results due to Dolgopyat, already generalized by Baladi and Vallée, and shows that the previous analytical conditions are fulfilled for ‘most’ dynamical sources, provided that they ‘strongly differ’ from simple sources. Finally, the present paper describes a class of sources not containing any simple source, where the trie depth has the same type of probabilistic behaviour as for simple sources, even with more precise estimates.