How to write your own Solidity linter using Slang

… in 25 lines of code!

How to write your own Solidity linter using Slang

… in 25 lines of code!

Slang is Nomic Foundation’s modular set of compiler APIs empowering the next generation of Solidity code analysis and developer tooling. It’s written in Rust and distributed in multiple languages. It’s currently in alpha stage and in active development, but already useful for many things! Check out the initial alpha release announcement to learn more.

In this guide we will show how you can use Slang to write a simple linter for Solidity in just 25 lines of code. To pick a simple, yet real-life example, we will write our own version of the solhint avoid-tx-origin rule, which warns whenever tx.origin is used in the code.

Let’s get started!

The official Solidity documentation includes an example that illustrates why using tx.origin for authorization is a bad idea:

// SPDX-License-Identifier: GPL-3.0 
pragma solidity >=0.7.0 <0.9.0; 
// THIS CONTRACT CONTAINS A BUG - DO NOT USE 
contract TxUserWallet { 
    address owner; 
 
    constructor() { 
        owner = msg.sender; 
    } 
 
    function transferTo(address payable dest, uint amount) public { 
        // THE BUG IS RIGHT HERE, you must use msg.sender instead of tx.origin 
        require(tx.origin == owner); 
        dest.transfer(amount); 
    } 
}

We want our linter to be able to briefly report where the offending code is:

example.sol:13:17: warning: avoid using `tx.origin`

Rather than manually walk the syntax tree to find patterns of source code, like most linters do, we can use Slang’s tree query language, instead. It allows us to concisely specify the pattern we are looking for and the query engine will perform all the heavy-lifting for us!

To give you a sense of where we’re going, here’s a preview:

// Query that we’ll use to find the `tx.origin` expression 
const query = Query.parse( 
  `@txorigin [MemberAccessExpression 
     ... [Expression  ... ["tx"] ...] 
     ... [MemberAccess ... ["origin"] ...] 
     ... 
   ]`, 
); 
 
// Parse the source code: 
const language = new Language("0.8.22"); 
const output = language.parse(NonterminalKind.SourceUnit, contents); 
 
// Query the parsed code... 
const cursor = output.createTreeCursor(); 
const matches = cursor.query([query]); 
 
// ...and print the results: 
let match = null; 
while ((match = matches.next())) { 
    const txorigin = match.captures.txorigin[0]; 
    const { line, column } = txorigin.textOffset; 
    console.warn(`${filePath}:${line + 1}:${column + 1}: warning: avoid using \\`tx.origin\\``); 
}

This prints exactly what we want! Let’s dive in and learn how to implement this step-by-step.

Installation

First, we need to install Slang. The compiler is written in Rust and distributed both as a Rust package and an NPM package with TypeScript definitions. In this guide, we will use the latter.

Let’s open a terminal and create a new project:

mkdir my-awesome-linter/ 
cd my-awesome-linter 
npm init 
npm install @nomicfoundation/slang

Setting up TypeScript

We will use TypeScript to write our linter. Let’s install it and create a tsconfig.json file:

npm install --save-dev typescript @types/node 
npx tsc --init

Parsing the Solidity code

To analyze the code, we need to parse it into a concrete syntax tree (CST). A CST can represent incomplete or invalid code, and is a good starting point for writing a linter.

Let’s start by writing a simple index.ts that reads the contents of a file, specified as the first command line argument:

// index.ts 
import fs from 'node:fs'; 
const filePath = process.argv[2]; 
const contents = fs.readFileSync(filePath, 'utf8');

Supporting multiple versions of Solidity

The Solidity language has changed quite a lot over time, however Slang is able to parse all versions of Solidity that are in use today, which we consider to be 0.4.11 and later.

Let’s say that we want to be source-compatible with code that’s expected to work with Solidity 0.8.22. First, we construct an instance of the Language class, which is the main entry point for parsing Solidity code:

import { Language } from "@nomicfoundation/slang/language"; 
const language = new Language("0.8.22");

Parsing different language constructs

To parse the file using Slang, we’ll use the language.parse() method, which takes a NonterminalKind as its first argument, allowing us to specify which language construct to parse. Since we want to parse the entire file, we'll use NonterminalKind.SourceUnit.

import { NonterminalKind } from "@nomicfoundation/slang/kinds"; 
const output = language.parse(NonterminalKind.SourceUnit, contents);

Inspecting the parse output

The parse function returns a ParseOutput object, which contains the root of the CST (tree()) and a list of parse errors (errors()), if there are any.

To inspect the CST, we could print it to the console to see what it looks like:

const tree = results.tree(); 
console.log(tree.toJSON()); 
// Should print something like: 
// {"kind":"SourceUnit","text_len":{...},"children":[...]}

Matching specific patterns of code

We have just parsed the Solidity code into a structured representation that we can now analyze.

To analyze the CST we will use Slang’s tree query language, which was designed specifically for tasks like ours, and is a great alternative to analyzing the tree manually, due to its brevity and declarative nature.

The tree queries are instances of the Query class, which are created by parsing a query string, that match specific CST patterns and optionally binds variables to them. The syntax is described in the Tree Query Language reference.

Without going too much into the details of this query, we want to match the tx.origin expression, which is a MemberAccessExpression with tx identifier as the left-hand side and origin identifier as the right-hand side:

import { Query } from "@nomicfoundation/slang/query"; 
let query = Query.parse( 
  `@txorigin [MemberAccessExpression 
          ...        [Expression  ... @start ["tx"] ...] 
          ...        [MemberAccess ... ["origin"] ...] 
          ... 
    ]`,);

That’s a lot to unpack here! Let’s break it down:

  • tree nodes are enclosed in square brackets [].
  • the first name in the square brackets match the given node’s NonterminalKind.
  • after it, there is a list of children nodes we expect to match.
  • ... is a wildcard that matches any number of children.
  • @-prefixed names before nodes are captures, which are used to refer to specific nodes of the matched pattern.

Running the queries

The queries are executed using the Cursor class, which is another way to traverse the syntax tree, so we need to instantiate one that starts at the root of the tree:

const cursor = results.createTreeCursor(); 
// This is a shorthand for: 
// results.tree().createCursor({ utf8: 0, utf16: 0, line: 0, column: 0 })

While it’s possible to run multiple different queries concurrently using the same cursor, we will only run one in our case:

const matches = cursor.query([query]);

To access the matched QueryResults, we need to call next() repeatedly until it returns null:

let match = null; 
while (match = matches.next()) { 
    // ... do something with the matched tree fragment 
}

Now, for each query result, we can use the captures we defined in the query to access the nodes we are interested in.

Each cursor points to a single node but a capture can return multiple cursors, depending on the query. In our case @txorigin will return an array of one Cursor pointing to a MemberAccessExpression node.

Let’s inspect the JSON representation of the matched node pointed to by a Cursor:

const txorigin = match.captures.txorigin[0]; 
console.log(txorigin.node().toJSON()); 
// Should print our matched node: 
// {"kind":"MemberAccessExpression","text_len":{...},"children":[...]}

Reporting the findings

The only thing left to do is to report our findings to the user.

Because we get back a Cursor that points to the offending node from our queries, we can use its .textOffset property to map back its position in the source code. This property contains .line and .column properties that are exactly what we need. It’s worth keeping in mind that Slang uses 0-based indexing but the error reporting/editors often use more natural 1-based indexing, so we need to add 1 to these offsets to use it.

Having that, we can print out the warning message informing the user where the offending code is:

const txorigin = match.captures.txorigin[0]; 
const { line, column } = txorigin.textOffset; 
console.warn(`${filePath}:${line + 1}:${column + 1}: warning: avoid using \\`tx.origin\\``);

To access the full span of the node, we could use the textRange property on the cursor, which returns the start and the end offsets of the node in the source code.

We could get even more creative and plug this information into a custom formatter of our choice, but for now this will suffice.

Putting it all together

Here’s the complete code for our linter:

// file: index.ts 
import fs from "node:fs"; 
import { Language } from "@nomicfoundation/slang/language"; 
import { NonterminalKind } from "@nomicfoundation/slang/kinds"; 
import { Query } from "@nomicfoundation/slang/query"; 
 
const filePath = process.argv[2]; 
const contents = fs.readFileSync(filePath, "utf8"); 
 
const language = new Language("0.8.22"); 
const output = language.parse(NonterminalKind.SourceUnit, contents); 
const query = Query.parse( 
  `@txorigin [MemberAccessExpression 
     ... 
     [Expression  ... ["tx"] ...] 
     ... 
     [MemberAccess ... ["origin"] ...] 
   ]`, 
); 
 
const cursor = output.createTreeCursor(); 
const matches = cursor.query([query]); 
 
let match = null; 
while ((match = matches.next())) { 
    const txorigin = match.captures.txorigin[0]; 
    const { line, column } = txorigin.textOffset; 
    console.warn(`${filePath}:${line + 1}:${column + 1}: warning: avoid using \\`tx.origin\\``); 
}

If we don’t count the empty lines, the code is indeed 25 lines long! 🎉

Conclusion

In this guide, we’ve demonstrated how to create a simple linter for Solidity using Slang, implementing a simple version of the avoid-tx-origin rule from solhint in just 25 lines of code.

We covered the essentials of parsing Solidity code, identifying specific code patterns, and reporting findings to users in a clear and straightforward manner.

We hope that this guide has inspired you to write your own linters or any other tools that operate on Solidity code using Slang!

If you have any questions or feedback, feel free to reach out to us on GitHub and/or check out Slang’s documentation.