Published online by Cambridge University Press: 28 January 2014
VT (Viterbi training), or hard expectation maximization (EM), is an efficient way of parameter learning for probabilistic models with hidden variables. Given an observation y, it searches for a state of hidden variables x that maximizes p(x,y | θ) by coordinate ascent on parameters θ and x. In this paper we introduce VT to PRogramming In Statistical Modeling (PRISM), a logic-based probabilistic modeling system for generative models. VT improves PRISM in three ways. First, VT in PRISM converges faster than EM in PRISM due to VT's termination condition. Second, parameters learned by VT often show good prediction performance compared with those learned by EM. We conducted two parsing experiments with probabilistic grammars while learning parameters by a variety of inference methods, i.e. VT, EM, MAP and VB. The result is that VT achieved the best parsing accuracy among them in both experiments. Also, we conducted a similar experiment for classification tasks where a hidden variable is not a prediction target unlike probabilistic grammars. We found that in such a case VT does not necessarily yield superior performance. Third, since VT always deals with a single probability of a single explanation, Viterbi explanation, the exclusiveness condition imposed on PRISM programs is no more required if we learn parameters by VT. Last but not least, we can say that as VT in PRISM is general and applicable to any PRISM program, it largely reduces the need for the user to develop a specific VT algorithm for a specific model. Furthermore, since VT in PRISM can be used just by setting a PRISM flag appropriately, it makes VT easily accessible to (probabilistic) logic programmers.