Published online by Cambridge University Press: 22 May 2017
We propose a fundamentally new approach to Datalog evaluation. Given a linear Datalog program DB written using N constants and binary predicates, we first translate if-and-only-if completions of clauses in DB into a set E q (DB) of matrix equations with a non-linear operation, where relations in M DB, the least Herbrand model of DB, are encoded as adjacency matrices. We then translate E q (DB) into another, but purely linear matrix equations Ẽ q (DB). It is proved that the least solution of Ẽ q (DB) in the sense of matrix ordering is converted to the least solution of E q (DB) and the latter gives M DB as a set of adjacency matrices. Hence, computing the least solution of Ẽ q (DB) is equivalent to computing M DB specified by DB. For a class of tail recursive programs and for some other types of programs, our approach achieves O(N 3) time complexity irrespective of the number of variables in a clause since only matrix operations costing O(N 3) or less are used. We conducted two experiments that compute the least Herbrand models of linear Datalog programs. The first experiment computes transitive closure of artificial data and real network data taken from the Koblenz Network Collection. The second one compared the proposed approach with the state-of-the-art symbolic systems including two Prolog systems and two ASP systems, in terms of computation time for a transitive closure program and the same generation program. In the experiment, it is observed that our linear algebraic approach runs 101 ~ 104 times faster than the symbolic systems when data is not sparse. Our approach is inspired by the emergence of big knowledge graphs and expected to contribute to the realization of rich and scalable logical inference for knowledge graphs.