No CrossRef data available.
Published online by Cambridge University Press: 27 June 2016
Králík (1977) examined four Czech texts of 7000 words each, in which he marked every occurrence of the word meaning and (and separately, three other words). The interval between two successive occurrences of the marked word is called a gap, and the number of words in that gap is the gap length. When the quantity of gaps of each length was recorded, it was found that there were many more short gaps than long ones. Králík proposed an exponential decay model of this distribution of gap lengths: For N gaps in a text of T words, the proportion of those gaps that are of length x should tend to be:
f(x) = a exp (— ax) where a = N/T
That is, as the gap length increases, the number of such gaps decreases in a smooth, downwardly convex, “swooping” curve.