The statistical induction of stochastic context free grammars
from bracketed corpora with the
Inside Outside Algorithm is an appealing method for grammar learning, but
the computational
complexity of this algorithm has made it impossible to generate a large
scale grammar.
Researchers from natural language processing and speech recognition have
suggested various
methods to reduce the computational complexity and, at the same time, guide
the learning
algorithm towards a solution by, for example, placing constraints on the
grammar. We suggest
a method that strongly reduces that computational cost of the algorithm
without placing
constraints on the grammar. This method can in principle be combined with
any of the
constraints on grammars that have been suggested in earlier studies. We
show that it is
feasible to achieve results equivalent to earlier research, but with much
lower computational
effort. After creating a small grammar, the grammar is incrementally increased
while rules
that have become obsolete are removed at the same time. We explain the
modifications to
the algorithm, give results of experiments and compare these to results
reported in other
publications.