Nice article and great idea, but as is traditional there are some slight
fudges of what Datalog is or what Horn clauses are etc, that I'd like to
unfudge, slightly. It's Sunday! What better than to start our day with a very
quick and almost not completely fudgy intro to logic programming? Le'ts go!<p>To begin with, Datalog is not a "cousin" of Prolog as stated in the section
"Interlude: Brief Intro to Datalog". Datalogs (there are many variants!) are
subsets of Prolog. For example, a typical datalog is the language of definite
clauses with no function symbols [¹] and with no negation as failure [²].
Another datalog may allow only the cons function in order to handle lists;
etc.<p>Otherwise the syntax of datalog is identical to Prolog, but there is a further
difference, in that Prolog is evaluated from the "top down" whereas Datalog is
evaluated from the "bottom up". What that means is that given a "query" (we'll
come to the scare quotes in a moment) Prolog will try to find "rules" whose
heads unify with a literal in the query (A literal is an atom, or the negation
of an atom; "p(χ,α)" is an atom.) whereas datalog will first generate the set of
all ground atoms that are consequences of the program in the context of which
the query was made, then determine whether the atoms in the query are in the
set of consequences of the program [³]. The reason for the different execution
model is that the bottom-up evaluation is guaranteed to terminate [⁴] whereas
Prolog's top-down evaluation can "go infinite" [⁵]. There is of course
another, more subtle difference: Prolog can "go infinite" because of the
Halting problem, from which datalog does not suffer because, unlike Prolog, it
does not have Universal Turing Machine expressivity [⁶].<p>So in short, datalog is a restricted subset of Prolog that has the advantage
of being decidable, while Prolog in general is not, but is also incomplete
while Prolog is complete [⁷].<p>Now, the other slight fudge in the article is about "rules", "facts" and
"queries". Although this is established and well-heeled logic programming
terminology, it fudges the er fact that those three things are the <i>same kind
of thing</i>, namely, they are, all three of them, Horn clauses [⁸].<p>Specifically, Horn clauses are clauses with a single positive literal.<p>Crash course in FOL: an atom is a predicate symbol followed by a set of terms
in parentheses. Terms are variables, functions or constants (constants are
functions with 0 arity, i.e. 0 arguments). A literal is an atom, or the
negation of an atom. A clause is a disjunction of literals. A clause is Horn
when it has at most 1 positive literal. A Horn clause is a definite clause
when it has exactly 1 positive literal.<p>The following are Horn clauses:<p><pre><code> ¬P(χ) ∨ ¬P(υ)
P(χ) ∨ ¬Q(χ)
Q(α)
Q(β)
</code></pre>
In logic programming tradition, we write clauses as implications (because ¬A ∨
B ≡ A → B) and with the direction of the implication arrow reversed to make it
easier to read long implications with multiple premises. So the three clauses
above are written as:<p><pre><code> ←P(χ), P(υ) (a)
P(χ) ← Q(χ) (b)
Q(α)← (c)
Q(β)← (d)
</code></pre>
And those are a "query", (a), a "rule", (b) and two "facts", (c) and (d).<p>Note that (b,c,d) are <i>definite</i> clauses (they have exactly one positive
literal, i.e. their head literal, which is what we call the consequent in the
implication). Facts have only a positive literal; I like to read the dangling
implication symbol as "everything implies ...", but that's a bit
idiosyncratic. The bottom line is that definite clauses with no negative
literals can be though of as being always true, hence "facts". Queries, i.e.
Horn clauses with no positive literals, are the opposite: "nothing implies"
their body literals (my idiosyncratic reading) so they are "always false".
Queries are also called "goals". Finally, definite clauses with both positive
and negative literals can be thought of as "conditionally true".<p>Prolog and datalog programs are written as sets of definite clauses, i.e. sets
of "facts" and "rules". So, when we want to reason about the "facts" and
"rules" in the program, we make a "query". Then, the language interpreter,
which is a top-down resolution theorem prover [⁹] in the case of Prolog, or
bottom-up fixpoint calculation in the case of datalog [¹⁰], determines whether
our "query" is true. If the query includes any variables then the interpreter
also returns evey set of variable substitutions that make the query true.<p>In the example above, (a) has two variables, χ and υ and evaluating (a) in the
context of (b,c,d) would return a "true" result with the variable substitution
{χ/α,υ/β}, i.e. (a) is true iff χ = α and υ = β.<p>And that's how Horn clauses and definite clauses become "rules", "facts" and
"queries".<p>Next time: how the leopard got its stripes and the hippopotamus learned to
love the first order predicate calculus.<p>_________________<p>[¹] This is my (first) slight fudge because constants are also functions, with
0 arguments. So, to be formal, the typical datalog above has "no functions of
arity more than 0".<p>[²] Negation-as-failure makes a language non-monotonic, in the sense that
introducing new "facts" can change the meaning of a theory, i.e. a program.<p>[³] So, its Least Herbrand Model, or its Least Fix-Point (LFP).<p>[⁴] <i>Because</i> it finds the LFP of the query and the program.<p>[⁵] Unless evaluated by SLG resolution, a.k.a. tabling, similar to memoization.<p>[⁶] Although <i>higher-order</i> datalogs, that allow for predicate symbols as
terms of literals <i>have</i> UTM expressivity, indeed a UTM can be defined in a
higher-order datalog fragment where clauses have up to two body literals with
at most two arguments:<p><pre><code> utm(S,S) ← halt(S).
utm(S,T) ← execute(S,S₁), utm(S₁,T).
execute(S,T) ← instruction(S,P), P(S,T).
</code></pre>
Originally in:<p>Tärnlund, S.-A. (1977). Horn clause computability. BIT Numerical Mathematics,
17(2), 215–226.<p>[⁷] Less fudgy, definite programs are refutation complete under SLD
resolution, meaning that any atom that is entailed by a definite program can
be derived by SLD resolution. A definite program is a set of definite clauses,
explanation of which is coming right up.<p>[⁸] Long time ago, I explained this to a colleague who remarked that all the
nice syntactic elegance in query languages falls apart the moment you try to
make a query, which usually has a very different syntax than the actual rows
of the tables in the database. So I said "that's the point! Queries are also
Horn clauses!" and his immediate remark was "That just blew my mind". It's
been so long and I'm so used to the idea that I haven't a clue whether this is
really mind blowing. Probably, being my usual excited self, I just said it in
a way that it sounded mind blowing (gesticulating widely and jumping up and
down enthusiastically, you can picture the scene) so my colleague was just
being polite. That was over drinks at the pub after work anyway.<p>[⁹] Resolution is an inference rule that allows the derivation of new atoms
from a set of clauses. In theorem proving it's used to refute a goal clause by
deriving the empty clause, □. Since a goal is a set of negated literals,
refuting the goal means basically that the negated literals are true. So our
query is true in the context of our program.<p>[¹⁰] Datalog's bottom-up evaluation uses something called a TP operator. It
basically does what I said above, starts with the ground atoms in a program
and then derives the set of consequences of the clauses in the program. In
each iteration, the set of consequences are added to the program and the
process is repeated, until no new consequences are derived. As stated above,
the process is guaranteed to complete because every datalog definite program
has a least fixpoint, which is also its Least Herbrand Model (we won't go
into Herbrand Models and Herbrand interpretations, but, roughly, an LHM is the
smallest set of atoms that make the union of a definite program and a goal
true). A more complete introduction to LHMs and LFPs and how they are used in
bottom-up evaluation for datalog can be found here:<p><a href="https://www.doc.ic.ac.uk/~mjs/teaching/KnowledgeRep491/Fixpoint_Definite_491-2x1.pdf" rel="nofollow">https://www.doc.ic.ac.uk/~mjs/teaching/KnowledgeRep491/Fixpo...</a>