2. Lexical Structure¶
This section describes the detailed lexical structure of the Lean language. Many readers will want to skip this section on a first reading.
Lean input is processed into a stream of tokens by its scanner, using the UTF-8 encoding. The next token is the longest matching prefix of the remaining input.
token ::=symbol|command|ident|string|char|numeral|decimal|quoted_symbol|doc_comment|mod_doc_comment|field_notation
Tokens can be separated by the whitespace characters space, tab, line feed, and carriage return, as well as comments. Single-line comments start with --, whereas multi-line comments are enclosed by /- and -/ and can be nested.
2.1. Symbols and Commands¶
Symbols are static tokens that are used in term notations and commands. They can be both keyword-like (e.g. the have keyword) or use arbitrary Unicode characters.
Command tokens are static tokens that prefix any top-level declaration or action. They are usually keyword-like, with transitory commands like #print prefixed by an additional #. The set of built-in commands is listed in the Chapter 5 section.
Users can dynamically extend the sets of both symbols (via the commands listed in Section 2.6) and command tokens (via the [user_command] attribute).
2.2. Identifiers¶
An atomic identifier, or atomic name, is (roughly) an alphanumeric string that does not begin with a numeral. A (hierarchical) identifier, or name, consists of one or more atomic names separated by periods.
Parts of atomic names can be escaped by enclosing them in pairs of French double quotes «».
def foo.«bar.baz» := 0 -- name parts ["foo", "bar.baz"]
ident ::=atomic_ident|ident"."atomic_identatomic_ident ::=atomic_ident_startatomic_ident_rest* atomic_ident_start ::=letterlike| "_" |escaped_ident_partletterlike ::= [a-zA-Z] |greek|coptic|letterlike_symbolsgreek ::= <[α-ωΑ-Ωἀ-῾] except for [λΠΣ]> coptic ::= [ϊ-ϻ] letterlike_symbols ::= [℀-⅏] escaped_ident_part ::= "«" [^«»\r\n\t]* "»" atomic_ident_rest ::=atomic_ident_start| [0-9'ⁿ] |subscriptsubscript ::= [₀-₉ₐ-ₜᵢ-ᵪ]
2.3. String Literals¶
String literals are enclosed by double quotes ("). They may contain line breaks, which are conserved in the string value.
string ::= '"'string_item'"' string_item ::=string_char|string_escapestring_char ::= [^\\] string_escape ::= "\" ("\" | '"' | "'" | "n" | "t" | "x"hex_charhex_char) hex_char ::= [0-9a-fA-F]
2.5. Numeric Literals¶
Numeric literals can be specified in various bases.
numeral ::=numeral10|numeral2|numeral8|numeral16numeral10 ::= [0-9]+ numeral2 ::= "0" [bB] [0-1]+ numeral8 ::= "0" [oO] [0-7]+ numeral16 ::= "0" [xX]hex_char+
Decimal literals are currently only being used for some set_option values.
decimal ::= [0-9]+ "." [0-9]+
2.6. Quoted Symbols¶
In a fixed set of commands (notation, local notation, and reserve), symbols (known or unknown) can be quoted by enclosing them in backticks (`). Quoted symbols are used by these commands for registering new notations and symbols.
quoted_symbol ::= "`" " "*quoted_symbol_startquoted_symbol_rest* " "* "`" quoted_symbol_start ::= [^0-9"\n\t `] quoted_symbol_rest ::= [^"\n\t `]
A quoted symbol may contain surrounding whitespace, which is customarily used for pretty printing the symbol and ignored while scanning.
While backticks are not allowed in a user-defined symbol, they are used in some built-in symbols (see Quotations), which are accessible outside of the set of commands noted above.
2.7. Doc Comments¶
A special form of comments, doc comments are used to document modules and declarations.
doc_comment ::= "/--" ([^-] | "-" [^/])* "-/" mod_doc_comment ::= "/-!" ([^-] | "-" [^/])* "-/"
2.8. Field Notation¶
Trailing field notation tokens are used in expressions such as (1+1).to_string. Note that a.to_string is a single identifier, but may be interpreted as a field notation expression by the parser.
field_notation ::= "." ([0-9]+ | atomic_ident)