Naive E-graph Rewriting in Souffle Datalog

Related Articles

Equation rewriting is a compelling way to simplify expression, symbolic calculations, and compiler optimizations. Boxes are a data structure that retains all possible parallel terms instead of rewriting them destructively.

I previously wrote a blog post describing in Scottish how to encode some of the steps (matching and closing a congruence) in rewriting a box using Logical data breathing. The intention was to put together a more beautiful language at a high level (which has become an egglog. Try it in the browser! It’s neat!) For these sections by flattening out mostly expressions and inserting indirect guidelines into eclass id.

I abandoned using the egg souffle at some point because I realized it was unlikely to win egg In a game of its own, and it turned out that actually performing the encoding I described earlier was not that clear or pleasant. A big problem is the stratification requirements of the datalog (which might be a souffle let’s shut down?) And my reliance on a gensym counter. I tied myself in an attempt to extend the macro to stages of matching and closing congruence. It sucks.

I have since learned to understand more features of a souffle that are not an inventory data log. Such a powerful feature is Algebraic data types. Yes. Souffle datalog already has tree data structures available.

A simple version of egglog is what I like to call a “hashlog”. This is still a bottom-up data log-like estimate, but instead of being backed up with a box, you can be backed up with a hash data set.

My understanding of the structure of the backup data of Souffle’s ADTs makes Souffle already a hashlog. I think ADTs are flattened into tables with unique identifiers for each node in ADT.

In addition, Souffle already supports a Data structure of union find equality.

So at a high level, Souffle already supports the data structure components of boxes. In fact, it is not so painful to encode a written box or query questions directly in a souffle without compilation from a higher level language.

Here is a very simple example of Hatton’s razor: insert expressions.

#define term(a) eq(a,a)

.type AExpr = Lit n : number
            | Plus a: AExpr, b : AExpr

.decl eq(x : AExpr, y : AExpr) eqrel
// in the case of no expected saturation, you can early stop with this directive
//.limitsize eq(n=4)

eq(t1,t2) :- t1 = $Plus(a1,b1), t2 = $Plus(a2,b2), term(t1), eq(a1,a2), eq(b1,b2).

// termification
term(a), term(b) :- term($Plus(a,b)).

// constant propagation
eq(t, $Lit(a + b)) :- t = $Plus($Lit(a), $Lit(b)), term

// Commutativity of addition
eq($Plus(a,b), e) :- eq($Plus(b,a), e).

// Associativity
eq($Plus($Plus(a,b),c), e) :- eq($Plus(a,$Plus(b,c)), e).
eq($Plus(a,$Plus(b,c)), e) :- eq($Plus($Plus(a,b),c), e).

// Initialization
term($Plus($Lit(3), $Lit(4))).

.output eq

Run with souffle arith2.dl -D -

x       y
$Lit(3) $Lit(3)
$Lit(4) $Lit(4)
$Plus($Lit(3), $Lit(4)) $Plus($Lit(3), $Lit(4))
$Plus($Lit(3), $Lit(4)) $Plus($Lit(4), $Lit(3))
$Plus($Lit(3), $Lit(4)) $Lit(7)
$Plus($Lit(4), $Lit(3)) $Plus($Lit(3), $Lit(4))
$Plus($Lit(4), $Lit(3)) $Plus($Lit(4), $Lit(3))
$Plus($Lit(4), $Lit(3)) $Lit(7)
$Lit(7) $Plus($Lit(3), $Lit(4))
$Lit(7) $Plus($Lit(4), $Lit(3))
$Lit(7) $Lit(7)

Note that the default souffle has the pre-C processor available. It is very convenient but also detrimental. Constant proliferation can be expressed directly through the same mechanisms here, whereas in an egg it is an operation. “Termination” is the fulfillment of the finding of union with the sub-terms. It’s not clear if my little trick to do the term A database is exactly equal to eq(a,a) he is smart.

There are a few more useful macros that can be done to make the written rules look better

#define RW(a,b) eq(b, myextremelyfreshvalue) :- eq(a, myextremelyfreshvalue)
#define BIRW(a,b) RW(a,b). RW(b,a)

In addition, it is possible to macro the relationship of congruence closure and termination.

Here is a very similar example using a slightly larger input language. It also shows an extraction of very small parallel terms for input QUERY.

#define term(a) eq(a,a)
#define RW(a,b) eq(b, myextremelyfreshvalue) :- eq(a, myextremelyfreshvalue)
#define BIRW(a,b) RW(a,b). RW(b,a)

.type AExpr = Lit n : number
            | Var x : symbol
            | Plus a: AExpr, b : AExpr
            | Mul a: AExpr, b : AExpr

.decl eq(x : AExpr, y : AExpr) eqrel
//.limitsize eq(n=4)

eq(t1,t2) :- t1 = $Plus(a1,b1), t2 = $Plus(a2,b2), term(t1), eq(a1,a2), eq(b1,b2). // subtle: include term(t2) or not
eq(t1,t2) :- t1 = $Mul(a1,b1), t2 = $Mul(a2,b2), term(t1), eq(a1,a2), eq(b1,b2).

// termification
term(a), term(b) :- term($Plus(a,b)).
term(a), term(b) :- term($Mul(a,b)).

// constant propagation
eq(t, $Lit(a + b)) :- t = $Plus($Lit(a), $Lit(b)), term
eq(t, $Lit(a * b)) :- t = $Mul($Lit(a), $Lit(b)), term

// Commutativity
eq($Plus(a,b), e) :- eq($Plus(b,a), e).
eq($Mul(a,b), e)  :- eq($Mul(b,a), e).

// Associativity
BIRW( $Plus($Plus(a,b),c), $Plus(a,$Plus(b,c)) ).
eq($Plus(a,$Plus(b,c)), e) :- eq($Plus($Plus(a,b),c), e).
eq($Mul($Mul(a,b),c), e) :- eq($Mul(a,$Mul(b,c)), e).
eq($Mul(a,$Mul(b,c)), e) :- eq($Mul($Mul(a,b),c), e).

// distributivity
eq($Plus($Mul(a,b), $Mul(a,c)), e) :- eq($Mul(a, $Plus(b,c)), e).
eq($Mul(a, $Plus(b,c)), e)     :- eq($Plus($Mul(a,b), $Mul(a,c)), e).

.decl size(t : AExpr, s : unsigned)
size($Lit(a),1) :- term($Lit(a)).
size($Var(a),1) :- term($Var(a)).
size(t, 1 + sa + sb) :-  t = $Mul(a,b), term
size(t, 1 + sa + sb) :-  t = $Plus(a,b), term

// Initialization
//term($Plus($Lit(3), $Lit(4))).
#define QUERY $Mul($Lit(4),$Plus($Var("x"), $Lit(4)))

.decl res(t : AExpr)

.output res


  • Closing a congruence must be written as explicit clauses
  • This is an ineffective application of boxing
  • It stores too much unnecessary information

However, there are also huge benefits, especially related to how a typical souffle.


  • Full expression of Aglog and more
  • Built-in complete calculations
  • A particular analysis can be written as data logging software
  • Extract can be written as data log software
  • I declared more than rust
  • A souffle can be accepted and can be compiled into C ++ code.
  • Souffle can read and write to CSV or SQLLite.
  • Souffle supports the creation of evidence of some kind

There may be some point in the application space where the pros outweigh the cons. Anyway, I think that clarifies the intended semantics of egglog and was worth a blog post.

Beats and bubbles

There is a possibility that the datalog can be rewritten in a more efficient style. Perhaps through a parents Table to accelerate fitting closure and other memorization features. Another option is not to actually build the full equivalence ratio across all terms, but to perform a bypass through the match ratio at each ematch position. But it would be awful to write.

Discount (A recent addition to the souffle) may allow for unnecessary information about a box to be forgotten, leaving the tables smaller.

Maybe if the souffle developers are interested in such things, they can find a way to get more effective direct support for fists

I note that the authors of the souffle have a A similar example To the above in their test folder

How useful are magic transformations for proof purposes?

You can get (and must be) different eq Relationships if you have multiple ADTs that you have configured. It’s good and bad.

The label could be removed quite easily during compilation. You only need an initialization indicator and any written rule that builds new complex terms.



Please enter your comment!
Please enter your name here

Popular Articles