Try it online
What is a database is in the eye of the beholder.
Datalogs that support algebraic datatypes like souffle tend to use the analog of a hashcons to hold their trees data types. Hashcons allows for easy lookup and deduplication of tree datatypes.
You can use this hashcons to also hold your relations. So bada bing bada boom, you can make a pretty expressive (albeit inefficient) datalog without much work at all
While building egglog many moons (like 9 months) ago, I realized that if you ignore the equality part the egraph is basically just a hashcons. Egglog used the egraph as it’s database, so hashlog is what is left if you replace the egraph with a hashcons.
There is a fabulous paper on hash consing, Type-Safe Modular Hash-Consing by Fillaitre and Conchon. I chose to go even dumber to start, although I’ve already moved over to using this library on my main branch. Even though it’s obviously inefficient, it’s nice to have a really dumb version down below. Note for example I used default hashing functions here, which is very bad. Every hash traverse the entire tree.
There is already a pretty nice menhir based parser for datalog by Simon Cruanes hereso that was a big boost.
I also used js_of_ocaml to upload a version that works in the browser and is hosted on github.io. Try it out here http://www.philipzucker.com/hashlog/. I made a function that has the signature
open Js_of_ocaml let _ = Js.export "MyHashlog" (* Just "Hashlog" clashes with module name? *) (object%js (* what. run_string somehow is not acceptable? Underscores are not acceptable? anything called run? *) method dohashlog x = Hashlog.run_string (Js.to_string x) |> Js.string end)
gh-pages branch for full details on the js_of_ocaml setup
dune build js/main.bc.js --release cp _build/default/js/main.bc.js main.bc.js
The code of this simple initial version is here https://github.com/philzook58/hashlog/tree/simple. Here is the rough gist:
open Core_kernel module AST = AST module Parser = Parser module Lexer = Lexer module Term = struct type t = Apply of head : string; args : t list | Int of int [@@deriving equal, compare, hash, sexp] let rec pp_term ppf term = match term with | Apply head = f; args =  -> Format.fprintf ppf "%s" f | Apply head = f; args -> Format.fprintf ppf "%s(%a)" f (AST.pp_list_comma pp_term) args | Int i -> Format.fprintf ppf "%d" i end (* It's nice to have a global table *) let table = Hashtbl.create (module Term) let clear () = Hashtbl.clear table (* smart constructors *) let apply head args = let x : Term.t = Apply head; args in Hashtbl.find_or_add table x ~default:(fun () -> x) let myint i = let x : Term.t = Int i in Hashtbl.find_or_add table x ~default:(fun () -> x) (* In a pattern matching environment, you have a mapping between the names of pattern variables to the term they matched to. *) type env = Term.t String.Map.t (* pattern matching really follows your nose. The only interesting case is Var, where you lookup in the env. If it is already in the env, check compatiblity. If not add a new binding to the env. Pattern are described in module AST. Basically the same as Term.t but with a Var case. type AST.term = Var of string | Apply of string * AST.term list | Int of int *) let rec pmatch (env : env) (p : AST.term) (t : Term.t) : env option = match (p, t) with | Apply (f, args'), Apply head; args -> if String.(f = head) && Int.equal (List.length args) (List.length args') then List.fold2_exn args' args ~init:(Some env) ~f:(fun acc arg' arg -> Option.bind acc ~f:(fun env -> pmatch env arg' arg)) else None | Int i, Int j -> if Int.(i = j) then Some env else None | Int _, Apply _ -> None | Apply _, Int _ -> None | Var p, _ -> ( match String.Map.find env p with | None -> Some (String.Map.add_exn env ~key:p ~data:t) | Some t' -> if Poly.(t = t') then Some env else None) (* Search over the entire hash cons for a tree *) let search_pat env (p : AST.term) = let ts = Hashtbl.keys table in List.filter_map ts ~f:(pmatch env p) (* Search multiple trees over the hash cons. They may share pattern variables, so the env is threaded through *) let search_multi_pat (body : AST.term list) = List.fold body ~init:[ String.Map.empty ] ~f:(fun acc p -> List.bind acc ~f:(fun env -> search_pat env p)) (* Instantiate the head of a clause *) let instan (env : env) (head : AST.term) : Term.t = let rec worker p = match p with | AST.Apply (f, args) -> apply f (List.map ~f:worker args) | Int i -> myint i | Var p -> String.Map.find_exn env p in worker head type clause = AST.term * AST.term list (* running a clause is searching, then instantiating *) let run_clause (clause : clause) = let head, body = clause in let envs = search_multi_pat body in List.iter envs ~f:(fun env -> let (_ : Term.t) = instan env head in ()) (* Run clauses until no new facts derived, aka hash cons does not increase in size *) let run_loop (prog : clause list) = let rec worker n = List.iter prog ~f:run_clause; let n' = Hashtbl.length table in if Int.(n = n') then () else worker n' in worker (Hashtbl.length table) let strip_pos (file : AST.file) = List.map file ~f:(fun (head, clauses) -> ( head, List.map clauses ~f:(fun lit -> match lit with | AST.LitPos p -> p | _ -> failwith "not a positive literal") )) (* Special relation called [output] is printed to screen *) let run_file file = let file = strip_pos file in run_loop file; let envs = search_pat String.Map.empty (AST.Apply ("output", [ AST.Var "o" ])) in List.iter envs ~f:(fun env -> let t = String.Map.find_exn env "o" in Format.fprintf Format.str_formatter "%an" Term.pp_term t); Format.flush_str_formatter ()
The above is a bit silly, but there are some fun things that are the real reason I started down this path:
- Intern way more, make kind of sort of efficient.
- Do not use hash cons as db. Make seperate db so that I can precompile to patterns and search more efficiently.
- Semi naive
- Use relational ematching flavored search? Is there a point?
- Extend with miller pattern matching and lambda terms. We’re gettin there.
- Extend to harrop clauses?
I’m havin ‘a ball.