Structures
The first step in writing a program is usually to identify the problem domain's concepts, and then find suitable representations for them in code.
Sometimes, a domain concept is a collection of other, simpler, concepts.
In that case, it can be convenient to group these simpler components together into a single "package", which can then be given a meaningful name.
In Lean, this is done using structures, which are analogous to struct
s in C or Rust and record
s in C#.
Defining a structure introduces a completely new type to Lean that can't be reduced to any other type. This is useful because multiple structures might represent different concepts that nonetheless contain the same data. For instance, a point might be represented using either Cartesian or polar coordinates, each being a pair of floating-point numbers. Defining separate structures prevents API clients from confusing one for another.
Lean's floating-point number type is called Float
, and floating-point numbers are written in the usual notation.
#check 1.2
1.2 : Float
#check -454.2123215
-454.2123215 : Float
#check 0.0
0.0 : Float
When floating point numbers are written with the decimal point, Lean will infer the type Float
. If they are written without it, then a type annotation may be necessary.
#check 0
0 : Nat
#check (0 : Float)
0 : Float
A Cartesian point is a structure with two Float
fields, called x
and y
.
This is declared using the structure
keyword.
structure Point where
x : Float
y : Float
deriving Repr
After this declaration, Point
is a new structure type.
The final line, which says deriving Repr
, asks Lean to generate code to display values of type Point
.
This code is used by #eval
to render the result of evaluation for consumption by programmers, analogous to the repr
function in Python.
It is also possible to override the compiler's generated display code.
The typical way to create a value of a structure type is to provide values for all of its fields inside of curly braces.
The origin of a Cartesian plane is where x
and y
are both zero:
def origin : Point := { x := 0.0, y := 0.0 }
If the deriving Repr
line in Point
's definition were omitted, then attempting #eval origin
would yield an error similar to that which occurs when omitting a function's argument:
expression
origin
has type
Point
but instance
Lean.MetaEval Point
failed to be synthesized, this instance instructs Lean on how to display the resulting value, recall that any type implementing the `Repr` class also implements the `Lean.MetaEval` class
That message is saying that the evaluation machinery doesn't know how to communicate the result of evaluation back to the user.
Happily, with deriving Repr
, the result of #eval origin
looks very much like the definition of origin
.
{ x := 0.000000, y := 0.000000 }
Because structures exist to "bundle up" a collection of data, naming it and treating it as a single unit, it is also important to be able to extract the individual fields of a structure. This is done using dot notation, as in C, Python, or Rust.
#eval origin.x
0.000000
#eval origin.y
0.000000
This can be used to define functions that take structures as arguments.
For instance, addition of points is performed by adding the underlying coordinate values.
It should be the case that #eval addPoints { x := 1.5, y := 32 } { x := -8, y := 0.2 }
yields
{ x := -6.500000, y := 32.200000 }
The function itself takes two Points
as arguments, called p1
and p2
.
The resulting point is based on the x
and y
fields of both p1
and p2
:
def addPoints (p1 : Point) (p2 : Point) : Point :=
{ x := p1.x + p2.x, y := p1.y + p2.y }
Similarly, the distance between two points, which is the square root of the sum of the squares of the differences in their x
and y
components, can be written:
def distance (p1 : Point) (p2 : Point) : Float :=
Float.sqrt (((p2.x - p1.x) ^ 2.0) + ((p2.y - p1.y) ^ 2.0))
For example, the distance between (1, 2) and (5, -1) is 5:
#eval distance { x := 1.0, y := 2.0 } { x := 5.0, y := -1.0 }
5.000000
Multiple structures may have fields with the same names.
For instance, a three-dimensional point datatype may share the fields x
and y
, and be instantiated with the same field names:
structure Point3D where
x : Float
y : Float
z : Float
deriving Repr
def origin3D : Point3D := { x := 0.0, y := 0.0, z := 0.0 }
This means that the structure's expected type must be known in order to use the curly-brace syntax. If the type is not known, Lean will not be able to instantiate the structure. For instance,
#check { x := 0.0, y := 0.0 }
leads to the error
invalid {...} notation, expected type is not known
As usual, the situation can be remedied by providing a type annotation.
#check ({ x := 0.0, y := 0.0 } : Point)
{ x := 0.0, y := 0.0 } : Point
To make programs more concise, Lean also allows the structure type annotation inside the curly braces.
#check { x := 0.0, y := 0.0 : Point}
{ x := 0.0, y := 0.0 } : Point
Updating Structures
Imagine a function zeroX
that replaces the x
field of a Point
with 0.0
.
In most programming language communities, this sentence would mean that the memory location pointed to by x
was to be overwritten with a new value.
However, Lean does not have mutable state.
In functional programming communities, what is almost always meant by this kind of statement is that a fresh Point
is allocated with the x
field pointing to the new value, and all other fields pointing to the original values from the input.
One way to write zeroX
is to follow this description literally, filling out the new value for x
and manually transferring y
:
def zeroX (p : Point) : Point :=
{ x := 0, y := p.y }
This style of programming has drawbacks, however. First off, if a new field is added to a structure, then every site that updates any field at all must be updated, causing maintenance difficulties. Secondly, if the structure contains multiple fields with the same type, then there is a real risk of copy-paste coding leading to field contents being duplicated or switched. Finally, the program becomes long and bureaucratic.
Lean provides a convenient syntax for replacing some fields in a structure while leaving the others alone.
This is done by using the with
keyword in a structure initialization.
The source of unchanged fields occurs before the with
, and the new fields occur after.
For instance, zeroX
can be written with only the new x
value:
def zeroX (p : Point) : Point :=
{ p with x := 0 }
Remember that this structure update syntax does not modify existing values—it creates new values that share some fields with old values.
For instance, given the point fourAndThree
:
def fourAndThree : Point :=
{ x := 4.3, y := 3.4 }
evaluating it, then evaluating an update of it using zeroX
, then evaluating it again yields the original value:
#eval fourAndThree
{ x := 4.300000, y := 3.400000 }
#eval zeroX fourAndThree
{ x := 0.000000, y := 3.400000 }
#eval fourAndThree
{ x := 4.300000, y := 3.400000 }
One consequence of the fact that structure updates do not modify the original structure is that it becomes easier to reason about cases where the new value is computed from the old one. All references to the old structure continue to refer to the same field values in all of the new values provided.
Behind the Scenes
Every structure has a constructor. Here, the term "constructor" may be a source of confusion. Unlike constructors in languages such as Java or Python, constructors in Lean are not arbitrary code to be run when a datatype is initialized. Instead, constructors simply gather the data to be stored in the newly-allocated data structure. It is not possible to provide a custom constructor that pre-processes data or rejects invalid arguments. This is really a case of the word "constructor" having different, but related, meanings in the two contexts.
By default, the constructor for a structure named S
is named S.mk
.
Here, S
is a namespace qualifier, and mk
is the name of the constructor itself.
Instead of using curly-brace initialization syntax, the constructor can also be applied directly.
#check Point.mk 1.5 2.8
However, this is not generally considered to be good Lean style, and Lean even returns its feedback using the standard structure initializer syntax.
{ x := 1.5, y := 2.8 } : Point
Constructors have function types, which means they can be used anywhere that a function is expected.
For instance, Point.mk
is a function that accepts two Float
s (respectively x
and y
) and returns a new Point
.
#check (Point.mk)
Point.mk : Float → Float → Point
To override a structure's constructor name, write it with two colons at the beginning.
For instance, to use Point.point
instead of Point.mk
, write:
structure Point where
point ::
x : Float
y : Float
deriving Repr
In addition to the constructor, an accessor function is defined for each field of a structure.
These have the same name as the field, in the structure's namespace.
For Point
, accessor functions Point.x
and Point.y
are generated.
#check (Point.x)
Point.x : Point → Float
#check (Point.y)
Point.y : Point → Float
In fact, just as the curly-braced structure construction syntax is converted to a call to the structure's constructor behind the scenes, the syntax p1.x
in the prior definition of addPoints
is converted into a call to the Point.x
accessor.
That is, #eval origin.x
and #eval Point.x origin
both yield
0.000000
Accessor dot notation is usable with more than just structure fields.
It can also be used for functions that take any number of arguments.
More generally, accessor notation has the form TARGET.f ARG1 ARG2 ...
.
If TARGET
has type T
, the function named T.f
is called.
TARGET
becomes its leftmost argument of type T
, which is often but not always the first one, and ARG1 ARG2 ...
are provided in order as the remaining arguments.
For instance, String.append
can be invoked from a string with accessor notation, even though String
is not a structure with an append
field.
#eval "one string".append " and another"
"one string and another"
In that example, TARGET
represents "one string"
and ARG1
represents " and another"
.
The function Point.modifyBoth
(that is, modifyBoth
defined in the Point
namespace) applies a function to both fields in a Point
:
def Point.modifyBoth (f : Float → Float) (p : Point) : Point :=
{ x := f p.x, y := f p.y }
Even though the Point
argument comes after the function argument, it can be used with dot notation as well:
#eval fourAndThree.modifyBoth Float.floor
{ x := 4.000000, y := 3.000000 }
In this case, TARGET
represents fourAndThree
, while ARG1
is Float.floor
.
This is because the target of the accessor notation is used as the first argument in which the type matches, not necessarily the first argument.
Exercises
- Define a structure named
RectangularPrism
that contains the height, width, and depth of a rectangular prism, each as aFloat
. - Define a function named
volume : RectangularPrism → Float
that computes the volume of a rectangular prism. - Define a structure named
Segment
that represents a line segment by its endpoints, and define a functionlength : Segment → Float
that computes the length of a line segment.Segment
should have at most two fields. - Which names are introduced by the declaration of
RectangularPrism
? - Which names are introduced by the following declarations of
Hamster
andBook
? What are their types?
structure Hamster where
name : String
fluffy : Bool
structure Book where
makeBook ::
title : String
author : String
price : Float