More do Features

Lean's do-notation provides a syntax for writing programs with monads that resembles imperative programming languages. In addition to providing a convenient syntax for programs with monads, do-notation provides syntax for using certain monad transformers.

Single-Branched if

When working in a monad, a common pattern is to carry out a side effect only if some condition is true. For instance, countLetters contains a check for vowels or consonants, and letters that are neither have no effect on the state. This is captured by having the else branch evaluate to pure (), which has no effects:

def countLetters (str : String) : StateT LetterCounts (Except Err) Unit :=
  let rec loop (chars : List Char) := do
    match chars with
    | [] => pure ()
    | c :: cs =>
      if c.isAlpha then
        if vowels.contains c then
          modify fun st => {st with vowels := st.vowels + 1}
        else if consonants.contains c then
          modify fun st => {st with consonants := st.consonants + 1}
        else -- modified or non-English letter
          pure ()
      else throw (.notALetter c)
      loop cs
  loop str.toList

When an if is a statement in a do-block, rather than being an expression, then else pure () can simply be omitted, and Lean inserts it automatically. The following definition of countLetters is completely equivalent:

def countLetters (str : String) : StateT LetterCounts (Except Err) Unit :=
  let rec loop (chars : List Char) := do
    match chars with
    | [] => pure ()
    | c :: cs =>
      if c.isAlpha then
        if vowels.contains c then
          modify fun st => {st with vowels := st.vowels + 1}
        else if consonants.contains c then
          modify fun st => {st with consonants := st.consonants + 1}
      else throw (.notALetter c)
      loop cs
  loop str.toList

A program that uses a state monad to count the entries in a list that satisfy some monadic check can be written as follows:

def count [Monad m] [MonadState Nat m] (p : α → m Bool) : List α → m Unit
  | [] => pure ()
  | x :: xs => do
    if ← p x then
      modify (· + 1)
    count p xs

Similarly, if not E1 then STMT... can instead be written unless E1 do STMT.... The converse of count that counts entries that don't satisfy the monadic check can be written by replacing if with unless:

def countNot [Monad m] [MonadState Nat m] (p : α → m Bool) : List α → m Unit
  | [] => pure ()
  | x :: xs => do
    unless ← p x do
      modify (· + 1)
    countNot p xs

Understanding single-branched if and unless does not require thinking about monad transformers. They simply replace the missing branch with pure (). The remaining extensions in this section, however, require Lean to automatically rewrite the do-block to add a local transformer on top of the monad that the do-block is written in.

Early Return

The standard library contains a function List.find? that returns the first entry in a list that satisfies some check. A simple implementation that doesn't make use of the fact that Option is a monad loops over the list using a recursive function, with an if to stop the loop when the desired entry is found:

def List.find? (p : α → Bool) : List α → Option α
  | [] => none
  | x :: xs =>
    if p x then
      some x
    else
      find? p xs

Imperative languages typically sport the return keyword that aborts the execution of a function, immediately returning some value to the caller. In Lean, this is available in do-notation, and return halts the execution of a do-block, with return's argument being the value returned from the monad. In other words, List.find? could have been written like this:

def List.find? (p : α → Bool) : List α → Option α
  | [] => failure
  | x :: xs => do
    if p x then return x
    find? p xs

Early return in imperative languages is a bit like an exception that can only cause the current stack frame to be unwound. Both early return and exceptions terminate execution of a block of code, effectively replacing the surrounding code with the thrown value. Behind the scenes, early return in Lean is implemented using a version of ExceptT. Each do-block that uses early return is wrapped in an exception handler (in the sense of the function tryCatch). Early returns are translated to throwing the value as an exception, and the handlers catch the thrown value and return it immediately. In other words, the do-block's original return value type is also used as the exception type.

Making this more concrete, the helper function runCatch strips a layer of ExceptT from the top of a monad transformer stack when the exception type and return type are the same:

def runCatch [Monad m] (action : ExceptT α m α) : m α := do
  match ← action with
  | Except.ok x => pure x
  | Except.error x => pure x

The do-block in List.find? that uses early return is translated to a do-block that does not use early return by wrapping it in a use of runCatch, and replacing early returns with throw:

def List.find? (p : α → Bool) : List α → Option α
  | [] => failure
  | x :: xs =>
    runCatch do
      if p x then throw x else pure ()
      monadLift (find? p xs)

Another situation in which early return is useful is command-line applications that terminate early if the arguments or input are incorrect. Many programs begin with a section that validates arguments and inputs before proceeding to the main body of the program. The following version of the greeting program hello-name checks that no command-line arguments were provided:

def main (argv : List String) : IO UInt32 := do
  let stdin ← IO.getStdin
  let stdout ← IO.getStdout
  let stderr ← IO.getStderr

  unless argv == [] do
    stderr.putStrLn s!"Expected no arguments, but got {argv.length}"
    return 1

  stdout.putStrLn "How would you like to be addressed?"
  stdout.flush

  let name := (← stdin.getLine).trim
  if name == "" then
    stderr.putStrLn s!"No name provided"
    return 1

  stdout.putStrLn s!"Hello, {name}!"

  return 0

Running it with no arguments and typing the name David yields the same result as the previous version:

$ lean --run EarlyReturn.lean
How would you like to be addressed?
David
Hello, David!

Providing the name as a command-line argument instead of an answer causes an error:

$ lean --run EarlyReturn.lean David
Expected no arguments, but got 1

And providing no name causes the other error:

$ lean --run EarlyReturn.lean
How would you like to be addressed?

No name provided

The program that uses early return avoids needing to nest the control flow, as is done in this version that does not use early return:

def main (argv : List String) : IO UInt32 := do
  let stdin ← IO.getStdin
  let stdout ← IO.getStdout
  let stderr ← IO.getStderr

  if argv != [] then
    stderr.putStrLn s!"Expected no arguments, but got {argv.length}"
    pure 1
  else
    stdout.putStrLn "How would you like to be addressed?"
    stdout.flush

    let name := (← stdin.getLine).trim
    if name == "" then
      stderr.putStrLn s!"No name provided"
      pure 1
    else
      stdout.putStrLn s!"Hello, {name}!"
      pure 0

One important difference between early return in Lean and early return in imperative languages is that Lean's early return applies only to the current do-block. When the entire definition of a function is in the same do block, this difference doesn't matter. But if do occurs underneath some other structures, then the difference becomes apparent. For example, given the following definition of greet:

def greet (name : String) : String :=
  "Hello, " ++ Id.run do return name

the expression greet "David" evaluates to "Hello, David", not just "David".

Loops

Just as every program with mutable state can be rewritten to a program that passes the state as arguments, every loop can be rewritten as a recursive function. From one perspective, List.find? is most clear as a recursive function. After all, its definition mirrors the structure of the list: if the head passes the check, then it should be returned; otherwise look in the tail. When no more entries remain, the answer is none. From another perspective, List.find? is most clear as a loop. After all, the program consults the entries in order until a satisfactory one is found, at which point it terminates. If the loop terminates without having returned, the answer is none.

Looping with ForM

Lean includes a type class that describes looping over a container type in some monad. This class is called ForM:

class ForM (m : Type u → Type v) (γ : Type w₁) (α : outParam (Type w₂)) where
  forM [Monad m] : γ → (α → m PUnit) → m PUnit

This class is quite general. The parameter m is a monad with some desired effects, γ is the collection to be looped over, and α is the type of elements from the collection. Typically, m is allowed to be any monad, but it is possible to have a data structure that e.g. only supports looping in IO. The method forM takes a collection, a monadic action to be run for its effects on each element from the collection, and is then responsible for running the actions.

The instance for List allows m to be any monad, it sets γ to be List α, and sets the class's α to be the same α found in the list:

def List.forM [Monad m] : List α → (α → m PUnit) → m PUnit
  | [], _ => pure ()
  | x :: xs, action => do
    action x
    forM xs action

instance : ForM m (List α) α where
  forM := List.forM

The function doList from doug is forM for lists. Because forM is intended to be used in do-blocks, it uses Monad rather than Applicative. forM can be used to make countLetters much shorter:

def countLetters (str : String) : StateT LetterCounts (Except Err) Unit :=
  forM str.toList fun c => do
    if c.isAlpha then
      if vowels.contains c then
        modify fun st => {st with vowels := st.vowels + 1}
      else if consonants.contains c then
        modify fun st => {st with consonants := st.consonants + 1}
    else throw (.notALetter c)

The instance for Many is very similar:

def Many.forM [Monad m] : Many α → (α → m PUnit) → m PUnit
  | Many.none, _ => pure ()
  | Many.more first rest, action => do
    action first
    forM (rest ()) action

instance : ForM m (Many α) α where
  forM := Many.forM

Because γ can be any type at all, ForM can support non-polymorphic collections. A very simple collection is one of the natural numbers less than some given number, in reverse order:

structure AllLessThan where
  num : Nat

Its forM operator applies the provided action to each smaller Nat:

def AllLessThan.forM [Monad m] (coll : AllLessThan) (action : Nat → m Unit) : m Unit :=
  let rec countdown : Nat → m Unit
    | 0 => pure ()
    | n + 1 => do
      action n
      countdown n
  countdown coll.num

instance : ForM m AllLessThan Nat where
  forM := AllLessThan.forM

Running IO.println on each number less than five can be accomplished with forM:

#eval forM { num := 5 : AllLessThan } IO.println
4
3
2
1
0

An example ForM instance that works only in a particular monad is one that loops over the lines read from an IO stream, such as standard input:

structure LinesOf where
  stream : IO.FS.Stream

partial def LinesOf.forM (readFrom : LinesOf) (action : String → IO Unit) : IO Unit := do
  let line ← readFrom.stream.getLine
  if line == "" then return ()
  action line
  forM readFrom action

instance : ForM IO LinesOf String where
  forM := LinesOf.forM

The definition of forM is marked partial because there is no guarantee that the stream is finite. In this case, IO.FS.Stream.getLine works only in the IO monad, so no other monad can be used for looping.

This example program uses this looping construct to filter out lines that don't contain letters:

def main (argv : List String) : IO UInt32 := do
  if argv != [] then
    IO.eprintln "Unexpected arguments"
    return 1

  forM (LinesOf.mk (← IO.getStdin)) fun line => do
    if line.any (·.isAlpha) then
      IO.print line

  return 0

The file test-data contains:

Hello!
!!!!!
12345
abc123

Ok

Invoking this program, which is stored in ForMIO.lean, yields the following output:

$ lean --run ForMIO.lean < test-data
Hello!
abc123
Ok

Stopping Iteration

Terminating a loop early is difficult to do with forM. Writing a function that iterates over the Nats in an AllLessThan only until 3 is reached requires a means of stopping the loop partway through. One way to achieve this is to use forM with the OptionT monad transformer. The first step is to define OptionT.exec, which discards information about both the return value and whether or not the transformed computation succeeded:

def OptionT.exec [Applicative m] (action : OptionT m α) : m Unit :=
  action *> pure ()

Then, failure in the OptionT instance of Alternative can be used to terminate looping early:

def countToThree (n : Nat) : IO Unit :=
  let nums : AllLessThan := ⟨n⟩
  OptionT.exec (forM nums fun i => do
    if i < 3 then failure else IO.println i)

A quick test demonstrates that this solution works:

#eval countToThree 7
6
5
4
3

However, this code is not so easy to read. Terminating a loop early is a common task, and Lean provides more syntactic sugar to make this easier. This same function can also be written as follows:

def countToThree (n : Nat) : IO Unit := do
  let nums : AllLessThan := ⟨n⟩
  for i in nums do
    if i < 3 then break
    IO.println i

Testing it reveals that it works just like the prior version:

#eval countToThree 7
6
5
4
3

At the time of writing, the for ... in ... do ... syntax desugars to the use of a type class called ForIn, which is a somewhat more complicated version of ForM that keeps track of state and early termination. However, there is a plan to refactor for loops to use the simpler ForM, with monad transformers inserted as necessary. In the meantime, an adapter is provided that converts a ForM instance into a ForIn instance, called ForM.forIn. To enable for loops based on a ForM instance, add something like the following, with appropriate replacements for AllLessThan and Nat:

instance : ForIn m AllLessThan Nat where
  forIn := ForM.forIn

Note, however, that this adapter only works for ForM instances that keep the monad unconstrained, as most of them do. This is because the adapter uses StateT and ExceptT, rather than the underlying monad.

Early return is supported in for loops. The translation of do blocks with early return into a use of an exception monad transformer applies equally well underneath forM as the earlier use of OptionT to halt iteration does. This version of List.find? makes use of both:

def List.find? (p : α → Bool) (xs : List α) : Option α := do
  for x in xs do
    if p x then return x
  failure

In addition to break, for loops support continue to skip the rest of the loop body in an iteration. An alternative (but confusing) formulation of List.find? skips elements that don't satisfy the check:

def List.find? (p : α → Bool) (xs : List α) : Option α := do
  for x in xs do
    if not (p x) then continue
    return x
  failure

A Range is a structure that consists of a starting number, an ending number, and a step. They represent a sequence of natural numbers, from the starting number to the ending number, increasing by the step each time. Lean has special syntax to construct ranges, consisting of square brackets, numbers, and colons that comes in four varieties. The stopping point must always be provided, while the start and the step are optional, defaulting to 0 and 1, respectively:

ExpressionStartStopStepAs List
[:10]0101[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[2:10]2101[2, 3, 4, 5, 6, 7, 8, 9]
[:10:3]0103[0, 3, 6, 9]
[2:10:3]2103[2, 5, 8]

Note that the starting number is included in the range, while the stopping numbers is not. All three arguments are Nats, which means that ranges cannot count down—a range where the starting number is greater than or equal to the stopping number simply contains no numbers.

Ranges can be used with for loops to draw numbers from the range. This program counts even numbers from four to eight:

def fourToEight : IO Unit := do
  for i in [4:9:2] do
    IO.println i

Running it yields:

4
6
8

Finally, for loops support iterating over multiple collections in parallel, by separating the in clauses with commas. Looping halts when the first collection runs out of elements, so the declaration:

def parallelLoop := do
  for x in ["currant", "gooseberry", "rowan"], y in [4:8] do
    IO.println (x, y)

produces three lines of output:

#eval parallelLoop
(currant, 4)
(gooseberry, 5)
(rowan, 6)

Mutable Variables

In addition to early return, else-less if, and for loops, Lean supports local mutable variables within a do block. Behind the scenes, these mutable variables desugar to a use of StateT, rather than being implemented by true mutable variables. Once again, functional programming is used to simulate imperative programming.

A local mutable variable is introduced with let mut instead of plain let. The definition two, which uses the identity monad Id to enable do-syntax without introducing any effects, counts to 2:

def two : Nat := Id.run do
  let mut x := 0
  x := x + 1
  x := x + 1
  return x

This code is equivalent to a definition that uses StateT to add 1 twice:

def two : Nat :=
  let block : StateT Nat Id Nat := do
    modify (· + 1)
    modify (· + 1)
    return (← get)
  let (result, _finalState) := block 0
  result

Local mutable variables work well with all the other features of do-notation that provide convenient syntax for monad transformers. The definition three counts the number of entries in a three-entry list:

def three : Nat := Id.run do
  let mut x := 0
  for _ in [1, 2, 3] do
    x := x + 1
  return x

Similarly, six adds the entries in a list:

def six : Nat := Id.run do
  let mut x := 0
  for y in [1, 2, 3] do
    x := x + y
  return x

List.count counts the number of entries in a list that satisfy some check:

def List.count (p : α → Bool) (xs : List α) : Nat := Id.run do
  let mut found := 0
  for x in xs do
    if p x then found := found + 1
  return found

Local mutable variables can be more convenient to use and easier to read than an explicit local use of StateT. However, they don't have the full power of unrestricted mutable variables from imperative languages. In particular, they can only be modified in the do-block in which they are introduced. This means, for instance, that for-loops can't be replaced by otherwise-equivalent recursive helper functions. This version of List.count:

def List.count (p : α → Bool) (xs : List α) : Nat := Id.run do
  let mut found := 0
  let rec go : List α → Id Unit
    | [] => pure ()
    | y :: ys => do
      if p y then found := found + 1
      go ys
  return found

yields the following error on the attempted mutation of found:

`found` cannot be mutated, only variables declared using `let mut` can be mutated. If you did not intent to mutate but define `found`, consider using `let found` instead

This is because the recursive function is written in the identity monad, and only the monad of the do-block in which the variable is introduced is transformed with StateT.

What counts as a do block?

Many features of do-notation apply only to a single do-block. Early return terminates the current block, and mutable variables can only be mutated in the block that they are defined in. To use them effectively, it's important to know what counts as "the same block".

Generally speaking, the indented block following the do keyword counts as a block, and the immediate sequence of statements underneath it are part of that block. Statements in independent blocks that are nonetheless contained in a block are not considered part of the block. However, the rules that govern what exactly counts as the same block are slightly subtle, so some examples are in order. The precise nature of the rules can be tested by setting up a program with a mutable variable and seeing where the mutation is allowed. This program has a mutation that is clearly in the same block as the mutable variable:

example : Id Unit := do
  let mut x := 0
  x := x + 1

When a mutation occurs in a do-block that is part of a let-statement that defines a name using :=, then it is not considered to be part of the block:

example : Id Unit := do
  let mut x := 0
  let other := do
    x := x + 1
  other
`x` cannot be mutated, only variables declared using `let mut` can be mutated. If you did not intent to mutate but define `x`, consider using `let x` instead

However, a do-block that occurs under a let-statement that defines a name using is considered part of the surrounding block. The following program is accepted:

example : Id Unit := do
  let mut x := 0
  let other ← do
    x := x + 1
  pure other

Similarly, do-blocks that occur as arguments to functions are independent of their surrounding blocks. The following program is not accepted:

example : Id Unit := do
  let mut x := 0
  let addFour (y : Id Nat) := Id.run y + 4
  addFour do
    x := 5
`x` cannot be mutated, only variables declared using `let mut` can be mutated. If you did not intent to mutate but define `x`, consider using `let x` instead

If the do keyword is completely redundant, then it does not introduce a new block. This program is accepted, and is equivalent to the first one in this section:

example : Id Unit := do
  let mut x := 0
  do x := x + 1

The contents of branches under a do (such as those introduced by match or if) are considered to be part of the surrounding block, whether or not a redundant do is added. The following programs are all accepted:

example : Id Unit := do
  let mut x := 0
  if x > 2 then
    x := x + 1

example : Id Unit := do
  let mut x := 0
  if x > 2 then do
    x := x + 1

example : Id Unit := do
  let mut x := 0
  match true with
  | true => x := x + 1
  | false => x := 17

example : Id Unit := do
  let mut x := 0
  match true with
  | true => do
    x := x + 1
  | false => do
    x := 17

Similarly, the do that occurs as part of the for and unless syntax is just part of their syntax, and does not introduce a fresh do-block. These programs are also accepted:

example : Id Unit := do
  let mut x := 0
  for y in [1:5] do
   x := x + y

example : Id Unit := do
  let mut x := 0
  unless 1 < 5 do
    x := x + 1

Imperative or Functional Programming?

The imperative features provided by Lean's do-notation allow many programs to very closely resemble their counterparts in languages like Rust, Java, or C#. This resemblance is very convenient when translating an imperative algorithm into Lean, and some tasks are just most naturally thought of imperatively. The introduction of monads and monad transformers enables imperative programs to be written in purely functional languages, and do-notation as a specialized syntax for monads (potentially locally transformed) allows functional programmers to have the best of both worlds: the strong reasoning principles afforded by immutability and a tight control over available effects through the type system are combined with syntax and libraries that allow programs that use effects to look familiar and be easy to read. Monads and monad transformers allow functional versus imperative programming to be a matter of perspective.

Exercises

  • Rewrite doug to use for instead of the doList function. Are there other opportunities to use the features introduced in this section to improve the code? If so, use them!