Hashlog: A Simple Datalog in OCaml

0
37

Try it online

What is a database is in the eye of the beholder.

Datalogs that support algebraic datatypes like souffle tend to use the analog of a hashcons to hold their trees data types. Hashcons allows for easy lookup and deduplication of tree datatypes.

You can use this hashcons to also hold your relations. So bada bing bada boom, you can make a pretty expressive (albeit inefficient) datalog without much work at all

While building egglog many moons (like 9 months) ago, I realized that if you ignore the equality part the egraph is basically just a hashcons. Egglog used the egraph as it’s database, so hashlog is what is left if you replace the egraph with a hashcons.

There is a fabulous paper on hash consing, Type-Safe Modular Hash-Consing by Fillaitre and Conchon. I chose to go even dumber to start, although I’ve already moved over to using this library on my main branch. Even though it’s obviously inefficient, it’s nice to have a really dumb version down below. Note for example I used default hashing functions here, which is very bad. Every hash traverse the entire tree.

There is already a pretty nice menhir based parser for datalog by Simon Cruanes hereso that was a big boost.

I also used js_of_ocaml to upload a version that works in the browser and is hosted on github.io. Try it out here http://www.philipzucker.com/hashlog/. I made a function that has the signature string -> string that takes in a datalog program, runs it, and gives the output. This I added to javascript bindings.

open Js_of_ocaml

let _ =
  Js.export "MyHashlog" (* Just "Hashlog" clashes with module name? *)
    (object%js
       (* what. run_string somehow is not acceptable? Underscores are not acceptable? anything called run? *)
       method dohashlog x = Hashlog.run_string (Js.to_string x) |> Js.string
     end)

Checkout the gh-pages branch for full details on the js_of_ocaml setup

Note thate --release made a way smaller javascript file. Like 10mb -> 600kb.

dune build js/main.bc.js --release
cp _build/default/js/main.bc.js main.bc.js

The code of this simple initial version is here https://github.com/philzook58/hashlog/tree/simple. Here is the rough gist:

open Core_kernel
module AST = AST
module Parser = Parser
module Lexer = Lexer

module Term = struct
  type t = Apply of  head : string; args : t list  | Int of int
  [@@deriving equal, compare, hash, sexp]

  let rec pp_term ppf term =
    match term with
    | Apply  head = f; args = []  -> Format.fprintf ppf "%s" f
    | Apply  head = f; args  ->
        Format.fprintf ppf "%s(%a)" f (AST.pp_list_comma pp_term) args
    | Int i -> Format.fprintf ppf "%d" i
end

(* It's nice to have a global table *)
let table = Hashtbl.create (module Term)
let clear () = Hashtbl.clear table

(* smart constructors *)
let apply head args =
  let x : Term.t = Apply  head; args  in
  Hashtbl.find_or_add table x ~default:(fun () -> x)

let myint i =
  let x : Term.t = Int i in
  Hashtbl.find_or_add table x ~default:(fun () -> x)

(* In a pattern matching environment, you have a mapping between
   the names of pattern variables to the term they matched to.
   *)
type env = Term.t String.Map.t

(* pattern matching really follows your nose.
   The only interesting case is Var, where you lookup in the env.
   If it is already in the env, check compatiblity. If not add a new binding to the env.

   Pattern are described in module AST. Basically the same as Term.t but with a Var case.
   
   type AST.term = Var of string | Apply of string * AST.term list | Int of int
*)
let rec pmatch (env : env) (p : AST.term) (t : Term.t) : env option =
  match (p, t) with
  | Apply (f, args'), Apply  head; args  ->
      if String.(f = head) && Int.equal (List.length args) (List.length args')
      then
        List.fold2_exn args' args ~init:(Some env) ~f:(fun acc arg' arg ->
            Option.bind acc ~f:(fun env -> pmatch env arg' arg))
      else None
  | Int i, Int j -> if Int.(i = j) then Some env else None
  | Int _, Apply _ -> None
  | Apply _, Int _ -> None
  | Var p, _ -> (
      match String.Map.find env p with
      | None -> Some (String.Map.add_exn env ~key:p ~data:t)
      | Some t' -> if Poly.(t = t') then Some env else None)

(* Search over the entire hash cons for a tree *)
let search_pat env (p : AST.term) =
  let ts = Hashtbl.keys table in
  List.filter_map ts ~f:(pmatch env p)

(* Search multiple trees over the hash cons. They may share pattern variables, so the
   env is threaded through *)
let search_multi_pat (body : AST.term list) =
  List.fold body ~init:[ String.Map.empty ] ~f:(fun acc p ->
      List.bind acc ~f:(fun env -> search_pat env p))

(* Instantiate the head of a clause *)
let instan (env : env) (head : AST.term) : Term.t =
  let rec worker p =
    match p with
    | AST.Apply (f, args) -> apply f (List.map ~f:worker args)
    | Int i -> myint i
    | Var p -> String.Map.find_exn env p
  in
  worker head

type clause = AST.term * AST.term list

(* running a clause is searching, then instantiating *)
let run_clause (clause : clause) =
  let head, body = clause in
  let envs = search_multi_pat body in
  List.iter envs ~f:(fun env ->
      let (_ : Term.t) = instan env head in
      ())

(* Run clauses until no new facts derived, aka hash cons does not increase in size *)
let run_loop (prog : clause list) =
  let rec worker n =
    List.iter prog ~f:run_clause;
    let n' = Hashtbl.length table in
    if Int.(n = n') then () else worker n'
  in
  worker (Hashtbl.length table)


let strip_pos (file : AST.file) =
    List.map file ~f:(fun (head, clauses) ->
        ( head,
          List.map clauses ~f:(fun lit ->
              match lit with
              | AST.LitPos p -> p
              | _ -> failwith "not a positive literal") ))
(* Special relation called [output] is printed to screen  *)
let run_file file = 
  let file = strip_pos file in
  run_loop file;
  let envs =
    search_pat String.Map.empty (AST.Apply ("output", [ AST.Var "o" ]))
  in
  List.iter envs ~f:(fun env ->
      let t = String.Map.find_exn env "o" in
      Format.fprintf Format.str_formatter "%an" Term.pp_term t);
  Format.flush_str_formatter ()

The above is a bit silly, but there are some fun things that are the real reason I started down this path:

  • Intern way more, make kind of sort of efficient.
  • Do not use hash cons as db. Make seperate db so that I can precompile to patterns and search more efficiently.
  • Semi naive
  • Use relational ematching flavored search? Is there a point?
  • Extend with miller pattern matching and lambda terms. We’re gettin there.
  • Extend to harrop clauses?

I’m havin ‘a ball.

Source

LEAVE A REPLY

Please enter your comment!
Please enter your name here