# Arithmetic as an embedded domain-specific language

Let's parse another classic grammar, the grammar of arithmetic expressions with addition, multiplication, integers, and variables. In the process, we'll learn how to:

• Convert identifiers such as `x` into strings within a macro.
• add the ability to "escape" the macro context from within the macro. This is useful to interpret identifiers with their original meaning (predefined values) instead of their new meaning within a macro (treat as a symbol).

Let's begin with the simplest thing possible. We'll define an AST, and use operators `+` and `*` to denote building an arithmetic AST.

Here's the AST that we will be parsing:

``````inductive Arith : Type
| add : Arith → Arith → Arith  -- e + f
| mul : Arith → Arith → Arith  -- e * f
| int : Int → Arith  -- constant
| symbol : String → Arith  -- variable
``````

We declare a syntax category to describe the grammar that we will be parsing. See that we control the precedence of `+` and `*` by writing `syntax:50` for addition and `syntax:60` for multiplication, indicating that multiplication binds tighter than addition (higher the number, tighter the binding). This allows us to declare precedence when defining new syntax.

``````declare_syntax_cat arith

syntax num : arith  -- int for Arith.int
syntax str : arith  -- strings for Arith.symbol
syntax:60  arith:60 "+" arith:61 : arith  -- Arith.add
syntax:70 arith:70 "*" arith:71 : arith  -- Arith.mul
syntax "(" arith ")" : arith  -- parenthesized expressions
``````

Further, if we look at `syntax:60 arith:60 "+" arith:61 : arith`, the precedence declarations at `arith:60 "+" arith:61` conveys that the left argument must have precedence at least `60` or greater, and the right argument must have precedence at least`61` or greater. Note that this forces left associativity. To understand this, let's compare two hypothetical parses:

``````-- syntax:60  arith:60 "+" arith:61 : arith -- Arith.add
-- a + b + c
(a:60 + b:61):60 + c
a + (b:60 + c:61):60
``````

In the parse tree of `a + (b:60 + c:61):60`, we see that the right argument `(b + c)` is given the precedence `60`. However, the rule for addition expects the right argument to have a precedence of at least 61, as witnessed by the `arith:61` at the right-hand-side of `syntax:60 arith:60 "+" arith:61 : arith`. Thus, the rule `syntax:60 arith:60 "+" arith:61 : arith` ensures that addition is left associative.

Since addition is declared arguments of precedence `60/61` and multiplication with `70/71`, this causes multiplication to bind tighter than addition. Once again, let's compare two hypothetical parses:

``````-- syntax:60  arith:60 "+" arith:61 : arith -- Arith.add
-- syntax:70 arith:70 "*" arith:71 : arith -- Arith.mul
-- a * b + c
a * (b:60 + c:61):60
(a:70 * b:71):70 + c
``````

While parsing `a * (b + c)`, `(b + c)` is assigned a precedence `60` by the addition rule. However, multiplication expects the right argument to have precedence at least 71. Thus, this parse is invalid. In contrast, `(a * b) + c` assigns a precedence of `70` to `(a * b)`. This is compatible with addition which expects the left argument to have precedence **at least `60` ** (`70` is greater than `60`). Thus, the string `a * b + c` is parsed as `(a * b) + c`. For more details, please look at the Lean manual on syntax extensions.

To go from strings into `Arith`, we define a macro to translate the syntax category `arith` into an `Arith` inductive value that lives in `term`:

``````-- auxiliary notation for translating `arith` into `term`
syntax "`[Arith| " arith "]" : term
``````

Our macro rules perform the "obvious" translation:

``````macro_rules
| `(`[Arith| \$s:str]) => `(Arith.symbol \$s)
| `(`[Arith| \$num:num]) => `(Arith.int \$num)
| `(`[Arith| \$x:arith + \$y:arith]) => `(Arith.add `[Arith| \$x] `[Arith| \$y])
| `(`[Arith| \$x:arith * \$y:arith]) => `(Arith.mul `[Arith| \$x] `[Arith| \$y])
| `(`[Arith| (\$x:arith)]) => `(`[Arith| \$x])
``````

And some examples:

``````#check `[Arith| "x" * "y"]  -- mul
-- Arith.mul (Arith.symbol "x") (Arith.symbol "y")

#check `[Arith| "x" + "y"]  -- add
-- Arith.add (Arith.symbol "x") (Arith.symbol "y")

#check `[Arith| "x" + 20]  -- symbol + int
-- Arith.add (Arith.symbol "x") (Arith.int 20)

#check `[Arith| "x" + "y" * "z"]  -- precedence
-- Arith.add (Arith.symbol "x") (Arith.mul (Arith.symbol "y") (Arith.symbol "z"))

#check `[Arith| "x" * "y" + "z"]  -- precedence
-- Arith.add (Arith.mul (Arith.symbol "x") (Arith.symbol "y")) (Arith.symbol "z")

#check `[Arith| ("x" + "y") * "z"]  -- parentheses
-- Arith.mul (Arith.add (Arith.symbol "x") (Arith.symbol "y")) (Arith.symbol "z")
``````

Writing variables as strings, such as `"x"` gets old; wouldn't it be so much prettier if we could write `x * y`, and have the macro translate this into `Arith.mul (Arith.Symbol "x") (Arith.mul "y")`? We can do this, and this will be our first taste of manipulating macro variables --- we'll use `x.getId` instead of directly evaluating `\$x`. We also write a macro rule for `Arith|` that translates an identifier into a string, using `\$(Lean.quote (toString x.getId))`:

``````syntax ident : arith

macro_rules
| `(`[Arith| \$x:ident]) => `(Arith.symbol \$(Lean.quote (toString x.getId)))
``````

Let's test and see that we can now write expressions such as `x * y` directly instead of having to write `"x" * "y"`:

``````#check `[Arith| x]  -- Arith.symbol "x"

def xPlusY := `[Arith| x + y]
#print xPlusY  -- def xPlusY : Arith := Arith.add (Arith.symbol "x") (Arith.symbol "y")
``````

We now show an unfortunate consequence of the above definitions. Suppose we want to build `(x + y) + z`. Since we already have defined `xPlusY` as `x + y`, perhaps we should reuse it! Let's try:

``````#check `[Arith| xPlusY + z]  -- Arith.add (Arith.symbol "xPlusY") (Arith.symbol "z")
``````

Whoops, that didn't work! What happened? Lean treats `xPlusY` itself as an identifier! So we need to add some syntax to be able to "escape" the `Arith|` context. Let's use the syntax `<[ \$e:term ]>` to mean: evaluate `\$e` as a real term, not an identifier. The macro looks like follows:

``````syntax "<[" term "]>" : arith  -- escape for embedding terms into `Arith`

macro_rules
| `(`[Arith| <[ \$e:term ]>]) => pure e
``````

Let's try our previous example:

``````#check `[Arith| <[ xPlusY ]> + z]  -- Arith.add xPlusY (Arith.symbol "z")
``````

Perfect!

In this tutorial, we expanded on the previous tutorial to parse a more realistic grammar with multiple levels of precedence, how to parse identifiers directly within a macro, and how to provide an escape from within the macro context.

#### Full code listing

``````inductive Arith : Type
| add : Arith → Arith → Arith  -- e + f
| mul : Arith → Arith → Arith  -- e * f
| int : Int → Arith  -- constant
| symbol : String → Arith  -- variable

declare_syntax_cat arith

syntax num : arith  -- int for Arith.int
syntax str : arith  -- strings for Arith.symbol
syntax:60  arith:60 "+" arith:61 : arith  -- Arith.add
syntax:70 arith:70 "*" arith:71 : arith  -- Arith.mul
syntax "(" arith ")" : arith  -- parenthesized expressions

-- auxiliary notation for translating `arith` into `term`
syntax "`[Arith| " arith "]" : term

macro_rules
| `(`[Arith| \$s:str]) => `(Arith.symbol \$s)
| `(`[Arith| \$num:num]) => `(Arith.int \$num)
| `(`[Arith| \$x:arith + \$y:arith]) => `(Arith.add `[Arith| \$x] `[Arith| \$y])
| `(`[Arith| \$x:arith * \$y:arith]) => `(Arith.mul `[Arith| \$x] `[Arith| \$y])
| `(`[Arith| (\$x:arith)]) => `(`[Arith| \$x])

#check `[Arith| "x" * "y"]  -- mul
-- Arith.mul (Arith.symbol "x") (Arith.symbol "y")

#check `[Arith| "x" + "y"]  -- add
-- Arith.add (Arith.symbol "x") (Arith.symbol "y")

#check `[Arith| "x" + 20]  -- symbol + int
-- Arith.add (Arith.symbol "x") (Arith.int 20)

#check `[Arith| "x" + "y" * "z"]  -- precedence
-- Arith.add (Arith.symbol "x") (Arith.mul (Arith.symbol "y") (Arith.symbol "z"))

#check `[Arith| "x" * "y" + "z"]  -- precedence
-- Arith.add (Arith.mul (Arith.symbol "x") (Arith.symbol "y")) (Arith.symbol "z")

#check `[Arith| ("x" + "y") * "z"]  -- parentheses
-- Arith.mul (Arith.add (Arith.symbol "x") (Arith.symbol "y")) (Arith.symbol "z")

syntax ident : arith

macro_rules
| `(`[Arith| \$x:ident]) => `(Arith.symbol \$(Lean.quote (toString x.getId)))

#check `[Arith| x]  -- Arith.symbol "x"

def xPlusY := `[Arith| x + y]
#print xPlusY  -- def xPlusY : Arith := Arith.add (Arith.symbol "x") (Arith.symbol "y")

syntax "<[" term "]>" : arith  -- escape for embedding terms into `Arith`

macro_rules
| `(`[Arith| <[ \$e:term ]>]) => pure e

#check `[Arith| <[ xPlusY ]> + z]  -- Arith.add xPlusY (Arith.symbol "z")
``````