Date:

Share:

Rewriting Go source code with AST tooling

Related Articles

Go is known for having a great tool for parsing code written in a language, right in the standard library with go/* Packages (go / parser, go / ast, go / types and so.); In addition
golang.org/x/tools The module contains a number of complementary packages that are even more powerful. I used one of these packages to describe how to write a multi-package analysis tool in a post from last year.

Here I want to write about a slightly different task: rewriting Go to source code using AST-based tools. I will start by providing a quick introduction on how stdlib’s existing capabilities are go / ast You can use the package to find points of interest in AST. Then, I’ll show how to do some simple writing with e go / ast Package without the need for additional tools. Finally, I will discuss the limitations of this approach and b
golang.org/x/tools/astutil A package that provides much more powerful AST editing capabilities.

This post assumes some basic level of familiarity with ASTs (abstract syntax trees) in general, and ASTs for Go in particular.

Finding Points of Interest in Go AST

Throughout this post, we are going to use the following simple section of Go as our lab rat:

package p

func pred() bool 
  return true


func pp(x int) int 
  if x > 2 && pred() 
    return 5
  

  var b = pred()
  if b 
    return 6
  
  return 0

We’ll start by finding all the calls to pre Function in this code. God
go / ast The package provides two approaches to finding points of interest in the code. First, we will discuss ast. Walk. The full code sample for this section is On GitHub. We begin by analyzing the source code (which we will pass to standard input):

fset := token.NewFileSet()
file, err := parser.ParseFile(fset, "src.go", os.Stdin, 0)
if err != nil 
  log.Fatal(err)

We now create a new value while applying ast.Visitor Interface and conversation
ast. Walk:

visitor := &Visitorfset: fset
ast.Walk(visitor, file)

Finally, the interesting part of the code is guest Type:

type Visitor struct 
  fset *token.FileSet


func (v *Visitor) Visit(n ast.Node) ast.Visitor 
  if n == nil 
    return nil
  

  switch x := n.(type) 
  case *ast.CallExpr:
    id, ok := x.Fun.(*ast.Ident)
    if ok 
      if id.Name == "pred" 
        fmt.Printf("Visit found call to pred() at %sn", v.fset.Position(n.Pos()))
      
    
  
  return v

Our visitor is only interested in AST type nodes CallExpr. Once it sees such a node, it checks the name of the function being called, and reports matches. Note the type statement on x Fun; We want to report calls only when the function is referenced by an ast.Ident. In Go, we can call functions in other ways, such as enabling anonymous functions directly – for example func () ().

We have File set Stored at the auditor; It is used here only to report locations in the surgeon code properly. To save space, the AST stores all location information individually int (In the nickname e token.Pos
Type), and File set It is required to translate these numbers into the actual locations of the expected <שם קובץ>: Row: Column form.

Go AST simulation

At this point it is worth mentioning some useful tools that help write surgeons for Go ASTs. First and foremost, e go / ast The package has a
print A function that will output AST in textual format. Here’s how full If A statement in our code snippet will appear if printed as follows:

.  .  1: *ast.IfStmt {
.  .  .  If: 9:2
.  .  .  Cond: *ast.BinaryExpr 
.  .  .  .  X: *ast.BinaryExpr 
.  .  .  .  .  X: *ast.Ident 
.  .  .  .  .  .  NamePos: 9:5
.  .  .  .  .  .  Name: "x"
.  .  .  .  .  .  Obj: *(obj @ 72)
.  .  .  .  .  
.  .  .  .  .  OpPos: 9:7
.  .  .  .  .  Op: >
.  .  .  .  .  Y: *ast.BasicLit 
.  .  .  .  .  .  ValuePos: 9:9
.  .  .  .  .  .  Kind: INT
.  .  .  .  .  .  Value: "2"
.  .  .  .  .  
.  .  .  .  
.  .  .  .  OpPos: 9:11
.  .  .  .  Op: &&
.  .  .  .  Y: *ast.CallExpr 
.  .  .  .  .  Fun: *ast.Ident 
.  .  .  .  .  .  NamePos: 9:14
.  .  .  .  .  .  Name: "pred"
.  .  .  .  .  .  Obj: *(obj @ 11)
.  .  .  .  .  
.  .  .  .  .  Lparen: 9:18
.  .  .  .  .  Ellipsis: -
.  .  .  .  .  Rparen: 9:19
.  .  .  .  
.  .  .  
.  .  .  Body: *ast.BlockStmt {
.  .  .  .  Lbrace: 9:21
.  .  .  .  List: []ast.Stmt (len = 1) 
.  .  .  .  .  0: *ast.ReturnStmt 
.  .  .  .  .  .  Return: 10:3
.  .  .  .  .  .  Results: []ast.Expr (len = 1) 
.  .  .  .  .  .  .  0: *ast.BasicLit 
.  .  .  .  .  .  .  .  ValuePos: 10:10
.  .  .  .  .  .  .  .  Kind: INT
.  .  .  .  .  .  .  .  Value: "5"
.  .  .  .  .  .  .  
.  .  .  .  .  .  
.  .  .  .  .  
.  .  .  .  
.  .  .  .  Rbrace: 11:2
.  .  .  }

A slightly more interactive way to explore this dump of AST is to use the web page at the address http://goast.yuroyoro.net/, Where you can paste your source and get dump AST with expandable and foldable sections. It helps to focus only on the parts we are interested in; Here is an excerpt from our AST:

Using the ast.Inspect API

Through ast. Walk Finding interesting nodes is quite simple, but it requires scaffolding that feels a bit heavy for simple needs – setting up a custom type that implements the ast.Visitor Interface and so on. Fortunately, e go / ast The package provides a lighter API – to check; Just need to provide it with closure. Here is our plan to find calls to them Before ()
Rewritten with Est.check:

func main() {
  fset := token.NewFileSet()
  file, err := parser.ParseFile(fset, "src.go", os.Stdin, 0)
  if err != nil 
    log.Fatal(err)
  

  ast.Inspect(file, func(n ast.Node) bool 
    switch x := n.(type) 
    case *ast.CallExpr:
      id, ok := x.Fun.(*ast.Ident)
      if ok 
        if id.Name == "pred" 
          fmt.Printf("Inspect found call to pred() at %sn", fset.Position(n.Pos()))
        
      
    
    return true
  )
}

The actual AST node matching logic is the same, but the surrounding code is a bit simpler. Unless there is a strong need to use ast. Walk especially,
Est.check Is the approach I recommend, and this is the approach we will use in the next section to actually rewrite the AST.

Which are simply written by AST

To get started, it is important to emphasize that the AST returned by the surgeon is a modifiable object. It is a collection of node values ​​that are connected to each other using pointers. We can change this group of nodes in any way we want – or even create a whole new group of nodes – and then use go / printer Package source code Go back from AST. The following program will simply redistribute the provided Go program (though it will drop the comments with the default configuration):

func main() 
  fset := token.NewFileSet()
  file, err := parser.ParseFile(fset, "src.go", os.Stdin, 0)
  if err != nil 
    log.Fatal(err)
  

  printer.Fprint(os.Stdout, fset, file)

Now, back to rewriting this AST. Let’s make some changes:

  1. We will change the name of the function pre To Before 2, And rename all call sites to name the new function.
  2. We will print a printout for the beginning of each body function – an imitation of some instrumentation that we can add in this way.

Given the original code snippet, the output will look like this (with the lines changed / highlighted news):

package p

func pred2() bool 
  fmt.Println("instrumentation")
  return true


func pp(x int) int 
  fmt.Println("instrumentation")
  if x > 2 && pred2() 
    return 5
  

  var b = pred2()
  if b 
    return 6
  
  return 0

The full code of our rewrite program is available Here. It uses Est.check To find the nodes he wants to operate on. Here’s the name change of the call sites:

ast.Inspect(file, func(n ast.Node) bool {
  switch x := n.(type) {
  case *ast.CallExpr:
    id, ok := x.Fun.(*ast.Ident)
    if ok 
      if id.Name == "pred" 
        id.Name += "2"
      
    
    // ...

If the function is called by an ID, the code will simply be added "2" opal. Again, we are not working on any copy of the AST – this is the
Real, alive AST We edit here.

We will now move on to the next step case, Where we submit function statements:

case *ast.FuncDecl:
  if x.Name.Name == "pred" 
    x.Name.Name += "2"
  

  newCallStmt := &ast.ExprStmt
    X: &ast.CallExpr
      Fun: &ast.SelectorExpr
        X: &ast.Ident
          Name: "fmt",
        ,
        Sel: &ast.Ident
          Name: "Println",
        ,
      ,
      Args: []ast.Expr
        &ast.BasicLit
          Kind:  token.STRING,
          Value: `"instrumentation"`,
        ,
      ,
    ,
  

  x.Body.List = append([]ast.StmtnewCallStmt, x.Body.List...)

The first three lines of it case Do the same as we did for the call sites – just rename pre To function to Before 2. The rest of the code is adding the printout to the beginning of a function body.

This task is quite easy to perform since each one FuncDecl there is body
Which is an ast.StmtList, Which itself holds a slice of ast.Stmt B
list attribute. The Out program attaches a new expression to this slice, and in fact adds a new statement to the beginning of the function body. The statement is a handmade AST junction. You’re probably thinking – how did I know how to build this node?

It really is not a big deal once you understand it. Analyzing small snippets of code and dumping their ASTs helps, as does the detailed documentation of go / ast package. I also found the go2ast A very useful tool; He takes a piece of code and emits exactly the Go code needed to build his AST.

Finally, at the end of the program we emit back the changed AST:

fmt.Println("Modified AST:")
printer.Fprint(os.Stdout, fset, file)

And that brings us to the weird section that is presented at the beginning of this section.

Limitations of Editing AST with Walk and Inspect

So far we have been able to rewrite the AST in some interesting ways using
Est.check To find the nodes. Can I do all Kind of written that way?

It turns out that the answer to this question is No, Or at least not easily. As an encouraging example, consider the following task: We want to rewrite every call to Before () So that it is logically excluded, or will become Before (). How do we do it?

You should take a few minutes to think about this question before you continue reading.

The problem is when Est.check (or ast. Walk) Is given to us a
ast.Node, We can change the content of the node and its children, but we can not replace the node itself. To replace the node itself, we will need access to its parent, but Est.check Does not give us a way to approach his parent. Another, slightly more technical way to think about it is: we get a node pointer By value, Which means we can change the node to which it points, but can not set the pointer to point to another node. To achieve the latter,
Est.check Will have to pass us a pointer to a pointer to an intersection.

This limitation was discussed some years ago, And finally in 2017 a new package appeared in the “extended stdlib” golang.org/x/tools Module –
Estutil.

Written stronger with Estutil

APIs Estutil Provides us not only with finding nodes in AST, but also a way to replace the node itself, not just its contents. In fact, the package provides a number of useful helpers for deleting, replacing and inserting new nodes through the cursor Type. Full training on the capabilities of
Estutil Is outside the scope of this post, but I will show how to use it to implement our mission to become all Before () into Before (). Beginners:

func main() {
  fset := token.NewFileSet()
  file, err := parser.ParseFile(fset, "src.go", os.Stdin, 0)
  if err != nil 
    log.Fatal(err)
  

  astutil.Apply(file, nil, func(c *astutil.Cursor) bool {
    n := c.Node()
    switch x := n.(type) 
    case *ast.CallExpr:
      id, ok := x.Fun.(*ast.Ident)
      if ok 
        if id.Name == "pred" 
          c.Replace(&ast.UnaryExpr
            Op: token.NOT,
            X:  x,
          )
        
      
    

    return true
  })

  fmt.Println("Modified AST:")
  printer.Fprint(os.Stdout, fset, file)
}

Instead of calling Est.check, We read astutil.Apply, Which also goes recursive on the AST and gives access to our junction closure. to apply Allows us to register a reconnection for both nodes before and after the Visited him; In this case we provide only the after the case.

Our closure recognizes the call to pre In a way that should be similar by now. He then uses the cursor Type to replace this node with a new one which is exactly the same node wrapped in unary No Phrase. Hidden in its application, e cursor The type has parent access to each node, allowing the actual node to be replaced with something else.

Source

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Popular Articles