Instruction Matching with Souffle Datalog

Related Articles

I think a lot about choosing instructions and datalog lately.

Selecting instructions is the step of taking a compiler intermediate representation and understanding which specific assembly instructions you want to use to implement it. This slightly more concrete instruction IR (MIR) usually still needs to be assigned and timed before it consists of a complete assembly.

Sometimes there is a nice one-on-one correspondence, as there may be an add instruction in your IR that is appropriate for an instruction to add an assembly.

Sometimes this may require multiple assembly instructions to implement a simple IR structure, or a simple assembly structure may implement a complicated IR expression. Simple examples of the latter included double-merging insert instructions, or memory-load instructions that allow permanent offsetting of the address (internal insert operation such as ld result [addr + 4]).

In a sense the way all this is usually done is by looking for special patterns in IR. These patterns may be slightly implied in the instruction selector code, or mixed code and DSL as in LLVM’s Tablegen.

there is An interesting article on the use of constraint satisfaction solvers Perform the problem of selecting the instructions I was trying to implement. At this high level:

  1. Find all the matches of patterns
  2. Grind on matches and send them to a constraint and solver model to choose the best fit cover
  3. Chew the result of the solver and build an IR in the next step from templates for each selected match

I thought 1 is kind of a cute example of using a datalog (though I did not take that approach in the project itself which should be open source soon).

If you have your IR in SSA form, it’s easy enough to express it as a table. Each sentence becomes an entry in the tables corresponding to the different types of statements.

What about the order of the statements in the block? In fact, the arrangement of IR allocations within a basic block is a bit IMO of red herring. This is an unfortunate side effect of having to do the thing in series, or of using vector / list data structures. The data flow within a block is really more graphical than a sequence. If you do, you need to explicitly chain effects as values. This is what was done in the newspaper.

Filling out the initial database looks a bit like writing LLVM IR. God . Of the datalog is a bit like the end of a sentence ;. You should indicate in which block each statement is with an explicit column in the statement table instead of declaring it at the top of the block as in LLVM IR.

Patterns can be specified as sections of a data log that populate tables of each possible match. This is not a recursive query, so you can also apply it in a standard sql of bog if you like. You can discard all the matches as the values ​​of these tables and then send them to the constraint resolver.

It’s all a bit literal. It is probably better to create this datalog from a more concise dsl of some kind.

.type var <: symbol
.type op <: symbol
.type blk <: symbol

.decl binop(blk : blk, out : var, op : op, a : var, b : var)
.decl unop(blk : blk, out : var, op : op, a : var)
.decl load(blk : blk, out : var, addr : var)
.decl br(blk: blk, cond : var, true_blk : blk, false_blk : blk)
// .decl phi() We'll think about this part later
    binop("b1", "z", "add", "x", "y" ).
    unop("b1", "a", "neg", "z" ).
    br("b1", "z", "b2", "b3").
    load("b2" , "q", "z").
    br("b2", "true" ,"exit", "exit").
    binop("b3", "addr", "add", "z", "4").
    load("b3" , "q", "addr").
    br("b3", "true" ,"exit", "exit").

.decl add_match(blk : blk, out : var, a : var, b : var)
.output add_match
add_match(blk,out,a,b) :- binop(blk,out, "add", a, b).

.decl neg_match(blk : blk, out : var, a : var)
.output neg_match
neg_match(blk,out,a) :- unop(blk,out, "neg",a).

.decl load_add_match(blk : blk, out : var, addr_off : var, addr : var, offset : var)
.output load_add_match
load_add_match(blk, z, addr_off, addr, offset) :- 
    binop(blk, addr_off, "add", addr, offset),
    load(blk, z, addr_off).

Runs it with souffle match.dl -D - Results b

blk     out     a       b
b1      z       x       y
b3      addr    z       4
blk     out     a
b1      a       z
blk     out     addr_off        addr    offset
b3      q       addr    z       4

Note that not everything was covered and the addition node in b3 was covered twice. The next steps of the switch should deal with this.

Finding matches within this IR is actually matching graph patterns since it is possible to get multi-block patterns that have cycles. Rearrangement of the body of print sections may have better or worse search characteristics.

Interestingly it reveals that there are more dimensions that need to be taken into account for graphs other than intentional / unintentional. These are “transmitted” graphs where the edges coming out of each vertex are not interchangeable in a sense. We are more familiar with this “baldness” in the context of trees and fish where standard ASTs like pow(x,n) Obviously not x and n Replaceable.

I was afraid of matching graphs in the past because it has a high complex estimate, but it really is not that terrible. Especially if you are looking for small patterns in a large graph like here. So even rather naive approaches are polynomial in the size of the matching graph (the intensity depends on the size of the pattern). This is what Datalog does here.

Here are some references to other graph matching algorithms

Ullmann bitvector algorithms for satisfying binary constraints



Please enter your comment!
Please enter your name here

Popular Articles