Indexed Families
Polymorphic inductive types take type arguments.
For instance, List
takes an argument that determines the type of the entries in the list, and Except
takes arguments that determine the types of the exceptions or values.
These type arguments, which are the same in every constructor of the datatype, are referred to as parameters.
Arguments to inductive types need not be the same in every constructor, however. Inductive types in which the arguments to the type vary based on the choice of constructor are called indexed families, and the arguments that vary are referred to as indices. The "hello world" of indexed families is a type of lists that contains the length of the list in addition to the type of entries, conventionally referred to as "vectors":
inductive Vect (α : Type u) : Nat → Type u where
| nil : Vect α 0
| cons : α → Vect α n → Vect α (n + 1)
Function declarations may take some arguments before the colon, indicating that they are available in the entire definition, and some arguments after, indicating a desire to pattern-match on them and define the function case by case.
Inductive datatypes have a similar principle: the argument α
is named at the top of the datatype declaration, prior to the colon, which indicates that it is a parameter that must be provided as the first argument in all occurrences of Vect
in the definition, while the Nat
argument occurs after the colon, indicating that it is an index that may vary.
Indeed, the three occurrences of Vect
in the nil
and cons
constructor declarations consistently provide α
as the first argument, while the second argument is different in each case.
The declaration of nil
states that it is a constructor of type Vect α 0
.
This means that using Vect.nil
in a context expecting a Vect String 3
is a type error, just as [1, 2, 3]
is a type error in a context that expects a List String
:
example : Vect String 3 := Vect.nil
type mismatch
Vect.nil
has type
Vect String 0 : Type
but is expected to have type
Vect String 3 : Type
The mismatch between 0
and 3
in this example plays exactly the same role as any other type mismatch, even though 0
and 3
are not themselves types.
Indexed families are called families of types because different index values can make different constructors available for use.
In some sense, an indexed family is not a type; rather, it is a collection of related types, and the choice of index values also chooses a type from the collection.
Choosing the index 5
for Vect
means that only the constructor cons
is available, and choosing the index 0
means that only nil
is available.
If the index is not yet known (e.g. because it is a variable), then no constructor can be used until it becomes known.
Using n
for the length allows neither Vect.nil
nor Vect.cons
, because there's no way to know whether the variable n
should stand for a Nat
that matches 0
or n + 1
:
example : Vect String n := Vect.nil
type mismatch
Vect.nil
has type
Vect String 0 : Type
but is expected to have type
Vect String n : Type
example : Vect String n := Vect.cons "Hello" (Vect.cons "world" Vect.nil)
type mismatch
Vect.cons "Hello" (Vect.cons "world" Vect.nil)
has type
Vect String (0 + 1 + 1) : Type
but is expected to have type
Vect String n : Type
Having the length of the list as part of its type means that the type becomes more informative.
For example, Vect.replicate
is a function that creates a Vect
with a number of copies of a given value.
The type that says this precisely is:
def Vect.replicate (n : Nat) (x : α) : Vect α n := _
The argument n
appears as the length of the result.
The message associated with the underscore placeholder describes the task at hand:
don't know how to synthesize placeholder
context:
α : Type u_1
n : Nat
x : α
⊢ Vect α n
When working with indexed families, constructors can only be applied when Lean can see that the constructor's index matches the index in the expected type.
However, neither constructor has an index that matches n
—nil
matches Nat.zero
, and cons
matches Nat.succ
.
Just as in the example type errors, the variable n
could stand for either, depending on which Nat
is provided to the function as an argument.
The solution is to use pattern matching to consider both of the possible cases:
def Vect.replicate (n : Nat) (x : α) : Vect α n :=
match n with
| 0 => _
| k + 1 => _
Because n
occurs in the expected type, pattern matching on n
refines the expected type in the two cases of the match.
In the first underscore, the expected type has become Vect α 0
:
don't know how to synthesize placeholder
context:
α : Type u_1
n : Nat
x : α
⊢ Vect α 0
In the second underscore, it has become Vect α (k + 1)
:
don't know how to synthesize placeholder
context:
α : Type u_1
n : Nat
x : α
k : Nat
⊢ Vect α (k + 1)
When pattern matching refines the type of a program in addition to discovering the structure of a value, it is called dependent pattern matching.
The refined type makes it possible to apply the constructors.
The first underscore matches Vect.nil
, and the second matches Vect.cons
:
def Vect.replicate (n : Nat) (x : α) : Vect α n :=
match n with
| 0 => .nil
| k + 1 => .cons _ _
The first underscore under the .cons
should have type α
.
There is an α
available, namely x
:
don't know how to synthesize placeholder
context:
α : Type u_1
n : Nat
x : α
k : Nat
⊢ α
The second underscore should be a Vect α k
, which can be produced by a recursive call to replicate
:
don't know how to synthesize placeholder
context:
α : Type u_1
n : Nat
x : α
k : Nat
⊢ Vect α k
Here is the final definition of replicate
:
def Vect.replicate (n : Nat) (x : α) : Vect α n :=
match n with
| 0 => .nil
| k + 1 => .cons x (replicate k x)
In addition to providing assistance while writing the function, the informative type of Vect.replicate
also allows client code to rule out a number of unexpected functions without having to read the source code.
A version of replicate
for lists could produce a list of the wrong length:
def List.replicate (n : Nat) (x : α) : List α :=
match n with
| 0 => []
| k + 1 => x :: x :: replicate k x
However, making this mistake with Vect.replicate
is a type error:
def Vect.replicate (n : Nat) (x : α) : Vect α n :=
match n with
| 0 => .nil
| k + 1 => .cons x (.cons x (replicate k x))
application type mismatch
cons x (cons x (replicate k x))
argument
cons x (replicate k x)
has type
Vect α (k + 1) : Type ?u.1998
but is expected to have type
Vect α k : Type ?u.1998
The function List.zip
combines two lists by pairing the first entry in the first list with the first entry in the second list, the second entry in the first list with the second entry in the second list, and so forth.
List.zip
can be used to pair the three highest peaks in the US state of Oregon with the three highest peaks in Denmark:
["Mount Hood",
"Mount Jefferson",
"South Sister"].zip ["Møllehøj", "Yding Skovhøj", "Ejer Bavnehøj"]
The result is a list of three pairs:
[("Mount Hood", "Møllehøj"),
("Mount Jefferson", "Yding Skovhøj"),
("South Sister", "Ejer Bavnehøj")]
It's somewhat unclear what should happen when the lists have different lengths. Like many languages, Lean chooses to ignore the extra entries in one of the lists. For instance, combining the heights of the five highest peaks in Oregon with those of the three highest peaks in Denmark yields three pairs. In particular,
[3428.8, 3201, 3158.5, 3075, 3064].zip [170.86, 170.77, 170.35]
evaluates to
[(3428.8, 170.86), (3201, 170.77), (3158.5, 170.35)]
While this approach is convenient because it always returns an answer, it runs the risk of throwing away data when the lists unintentionally have different lengths.
F# takes a different approach: its version of List.zip
throws an exception when the lengths don't match, as can be seen in this fsi
session:
> List.zip [3428.8; 3201.0; 3158.5; 3075.0; 3064.0] [170.86; 170.77; 170.35];;
System.ArgumentException: The lists had different lengths.
list2 is 2 elements shorter than list1 (Parameter 'list2')
at Microsoft.FSharp.Core.DetailedExceptions.invalidArgDifferentListLength[?](String arg1, String arg2, Int32 diff) in /builddir/build/BUILD/dotnet-v3.1.424-SDK/src/fsharp.3ef6f0b514198c0bfa6c2c09fefe41a740b024d5/src/fsharp/FSharp.Core/local.fs:line 24
at Microsoft.FSharp.Primitives.Basics.List.zipToFreshConsTail[a,b](FSharpList`1 cons, FSharpList`1 xs1, FSharpList`1 xs2) in /builddir/build/BUILD/dotnet-v3.1.424-SDK/src/fsharp.3ef6f0b514198c0bfa6c2c09fefe41a740b024d5/src/fsharp/FSharp.Core/local.fs:line 918
at Microsoft.FSharp.Primitives.Basics.List.zip[T1,T2](FSharpList`1 xs1, FSharpList`1 xs2) in /builddir/build/BUILD/dotnet-v3.1.424-SDK/src/fsharp.3ef6f0b514198c0bfa6c2c09fefe41a740b024d5/src/fsharp/FSharp.Core/local.fs:line 929
at Microsoft.FSharp.Collections.ListModule.Zip[T1,T2](FSharpList`1 list1, FSharpList`1 list2) in /builddir/build/BUILD/dotnet-v3.1.424-SDK/src/fsharp.3ef6f0b514198c0bfa6c2c09fefe41a740b024d5/src/fsharp/FSharp.Core/list.fs:line 466
at <StartupCode$FSI_0006>.$FSI_0006.main@()
Stopped due to error
This avoids accidentally discarding information, but crashing a program comes with its own difficulties.
The Lean equivalent, which would use the Option
or Except
monads, would introduce a burden that may not be worth the safety.
Using Vect
, however, it is possible to write a version of zip
with a type that requires that both arguments have the same length:
def Vect.zip : Vect α n → Vect β n → Vect (α × β) n
| .nil, .nil => .nil
| .cons x xs, .cons y ys => .cons (x, y) (zip xs ys)
This definition only has patterns for the cases where either both arguments are Vect.nil
or both arguments are Vect.cons
, and Lean accepts the definition without a "missing cases" error like the one that results from a similar definition for List
:
def List.zip : List α → List β → List (α × β)
| [], [] => []
| x :: xs, y :: ys => (x, y) :: zip xs ys
missing cases:
(List.cons _ _), []
[], (List.cons _ _)
This is because the constructor used in the first pattern, nil
or cons
, refines the type checker's knowledge about the length n
.
When the first pattern is nil
, the type checker can additionally determine that the length was 0
, so the only possible choice for the second pattern is nil
.
Similarly, when the first pattern is cons
, the type checker can determine that the length was k+1
for some Nat
k
, so the only possible choice for the second pattern is cons
.
Indeed, adding a case that uses nil
and cons
together is a type error, because the lengths don't match:
def Vect.zip : Vect α n → Vect β n → Vect (α × β) n
| .nil, .nil => .nil
| .nil, .cons y ys => .nil
| .cons x xs, .cons y ys => .cons (x, y) (zip xs ys)
type mismatch
Vect.cons y ys
has type
Vect β (?m.4718 + 1) : Type ?u.4530
but is expected to have type
Vect β 0 : Type ?u.4530
The refinement of the length can be observed by making n
into an explicit argument:
def Vect.zip : (n : Nat) → Vect α n → Vect β n → Vect (α × β) n
| 0, .nil, .nil => .nil
| k + 1, .cons x xs, .cons y ys => .cons (x, y) (zip k xs ys)
Exercises
Getting a feel for programming with dependent types requires experience, and the exercises in this section are very important. For each exercise, try to see which mistakes the type checker can catch, and which ones it can't, by experimenting with the code as you go. This is also a good way to develop a feel for the error messages.
-
Double-check that
Vect.zip
gives the right answer when combining the three highest peaks in Oregon with the three highest peaks in Denmark. BecauseVect
doesn't have the syntactic sugar thatList
has, it can be helpful to begin by definingoregonianPeaks : Vect String 3
anddanishPeaks : Vect String 3
. -
Define a function
Vect.map
with type(α → β) → Vect α n → Vect β n
. -
Define a function
Vect.zipWith
that combines the entries in aVect
one at a time with a function. It should have the type(α → β → γ) → Vect α n → Vect β n → Vect γ n
. -
Define a function
Vect.unzip
that splits aVect
of pairs into a pair ofVect
s. It should have the typeVect (α × β) n → Vect α n × Vect β n
. -
Define a function
Vect.snoc
that adds an entry to the end of aVect
. Its type should beVect α n → α → Vect α (n + 1)
and#eval Vect.snoc (.cons "snowy" .nil) "peaks"
should yieldVect.cons "snowy" (Vect.cons "peaks" (Vect.nil))
. The namesnoc
is a traditional functional programming pun: it iscons
backwards. -
Define a function
Vect.reverse
that reverses the order of aVect
. -
Define a function
Vect.drop
with the following type:(n : Nat) → Vect α (k + n) → Vect α k
. Verify that it works by checking that#eval danishPeaks.drop 2
yieldsVect.cons "Ejer Bavnehøj" (Vect.nil)
. -
Define a function
Vect.take
with type(n : Nat) → Vect α (k + n) → Vect α n
that returns the firstn
entries in theVect
. Check that it works on an example.