Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit b4b2b0d

Browse files
authoredJan 31, 2018
Merge pull request rust-lang#26 from mark-i-m/macros
Start macro expansion chapter
2 parents ed1e1f2 + 82da67a commit b4b2b0d

File tree

1 file changed

+160
-0
lines changed

1 file changed

+160
-0
lines changed
 

‎src/macro-expansion.md

+160
Original file line numberDiff line numberDiff line change
@@ -1 +1,161 @@
11
# Macro expansion
2+
3+
Macro expansion happens during parsing. `rustc` has two parsers, in fact: the
4+
normal Rust parser, and the macro parser. During the parsing phase, the normal
5+
Rust parser will set aside the contents of macros and their invokations. Later,
6+
before name resolution, macros are expanded using these portions of the code.
7+
The macro parser, in turn, may call the normal Rust parser when it needs to
8+
bind a metavariable (e.g. `$my_expr`) while parsing the contents of a macro
9+
invocation. The code for macro expansion is in
10+
[`src/libsyntax/ext/tt/`][code_dir]. This chapter aims to explain how macro
11+
expansion works.
12+
13+
### Example
14+
15+
It's helpful to have an example to refer to. For the remainder of this chapter,
16+
whenever we refer to the "example _definition_", we mean the following:
17+
18+
```rust
19+
macro_rules! printer {
20+
(print $mvar:ident) => {
21+
println!("{}", $mvar);
22+
}
23+
(print twice $mvar:ident) => {
24+
println!("{}", $mvar);
25+
println!("{}", $mvar);
26+
}
27+
}
28+
```
29+
30+
`$mvar` is called a _metavariable_. Unlike normal variables, rather than
31+
binding to a value in a computation, a metavariable binds _at compile time_ to
32+
a tree of _tokens_. A _token_ is a single "unit" of the grammar, such as an
33+
identifier (e.g., `foo`) or punctuation (e.g., `=>`). There are also other
34+
special tokens, such as `EOF`, which indicates that there are no more tokens.
35+
Token trees resulting from paired parentheses-like characters (`(`...`)`,
36+
`[`...`]`, and `{`...`}`) -- they include the open and close and all the tokens
37+
in between (we do require that parentheses-like characters be balanced). Having
38+
macro expansion operate on token streams rather than the raw bytes of a source
39+
file abstracts away a lot of complexity. The macro expander (and much of the
40+
rest of the compiler) doesn't really care that much about the exact line and
41+
column of some syntactic construct in the code; it cares about what constructs
42+
are used in the code. Using tokens allows us to care about _what_ without
43+
worrying about _where_. For more information about tokens, see the
44+
[Parsing][parsing] chapter of this book.
45+
46+
Whenever we refer to the "example _invocation_", we mean the following snippet:
47+
48+
```rust
49+
printer!(print foo); // Assume `foo` is a variable defined somewhere else...
50+
```
51+
52+
The process of expanding the macro invocation into the syntax tree
53+
`println!("{}", foo)` and then expanding that into a call to `Display::fmt` is
54+
called _macro expansion_, and it is the topic of this chapter.
55+
56+
### The macro parser
57+
58+
There are two parts to macro expansion: parsing the definition and parsing the
59+
invocations. Interestingly, both are done by the macro parser.
60+
61+
Basically, the macro parser is like an NFA-based regex parser. It uses an
62+
algorithm similar in spirit to the [Earley parsing
63+
algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro parser is
64+
defined in [`src/libsyntax/ext/tt/macro_parser.rs`][code_mp].
65+
66+
The interface of the macro parser is as follows (this is slightly simplified):
67+
68+
```rust
69+
fn parse(
70+
sess: ParserSession,
71+
tts: TokenStream,
72+
ms: &[TokenTree]
73+
) -> NamedParseResult
74+
```
75+
76+
In this interface:
77+
78+
- `sess` is a "parsing session", which keeps track of some metadata. Most
79+
notably, this is used to keep track of errors that are generated so they can
80+
be reported to the user.
81+
- `tts` is a stream of tokens. The macro parser's job is to consume the raw
82+
stream of tokens and output a binding of metavariables to corresponding token
83+
trees.
84+
- `ms` a _matcher_. This is a sequence of token trees that we want to match
85+
`tts` against.
86+
87+
In the analogy of a regex parser, `tts` is the input and we are matching it
88+
against the pattern `ms`. Using our examples, `tts` could be the stream of
89+
tokens containing the inside of the example invocation `print foo`, while `ms`
90+
might be the sequence of token (trees) `print $mvar:ident`.
91+
92+
The output of the parser is a `NamedParserResult`, which indicates which of
93+
three cases has occured:
94+
95+
- Success: `tts` matches the given matcher `ms`, and we have produced a binding
96+
from metavariables to the corresponding token trees.
97+
- Failure: `tts` does not match `ms`. This results in an error message such as
98+
"No rule expected token _blah_".
99+
- Error: some fatal error has occured _in the parser_. For example, this happens
100+
if there are more than one pattern match, since that indicates the macro is
101+
ambiguous.
102+
103+
The full interface is defined [here][code_parse_int].
104+
105+
The macro parser does pretty much exactly the same as a normal regex parser with
106+
one exception: in order to parse different types of metavariables, such as
107+
`ident`, `block`, `expr`, etc., the macro parser must sometimes call back to the
108+
normal Rust parser.
109+
110+
As mentioned above, both definitions and invocations of macros are parsed using
111+
the macro parser. This is extremely non-intuitive and self-referential. The code
112+
to parse macro _definitions_ is in
113+
[`src/libsyntax/ext/tt/macro_rules.rs`][code_mr]. It defines the pattern for
114+
matching for a macro definition as `$( $lhs:tt => $rhs:tt );+`. In other words,
115+
a `macro_rules` defintion should have in its body at least one occurence of a
116+
token tree followed by `=>` followed by another token tree. When the compiler
117+
comes to a `macro_rules` definition, it uses this pattern to match the two token
118+
trees per rule in the definition of the macro _using the macro parser itself_.
119+
In our example definition, the metavariable `$lhs` would match the patterns of
120+
both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs`
121+
would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{
122+
println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this
123+
knowledge around for when it needs to expand a macro invocation.
124+
125+
When the compiler comes to a macro invocation, it parses that invocation using
126+
the same NFA-based macro parser that is described above. However, the matcher
127+
used is the first token tree (`$lhs`) extracted from the arms of the macro
128+
_definition_. Using our example, we would try to match the token stream `print
129+
foo` from the invocation against the matchers `print $mvar:ident` and `print
130+
twice $mvar:ident` that we previously extracted from the definition. The
131+
algorithm is exactly the same, but when the macro parser comes to a place in the
132+
current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
133+
it calls back to the normal Rust parser to get the contents of that
134+
non-terminal. In this case, the Rust parser would look for an `ident` token,
135+
which it finds (`foo`) and returns to the macro parser. Then, the macro parser
136+
proceeds in parsing as normal. Also, note that exactly one of the matchers from
137+
the various arms should match the invocation (otherwise, the macro is
138+
ambiguous).
139+
140+
For more information about the macro parser's implementation, see the comments
141+
in [`src/libsyntax/ext/tt/macro_parser.rs`][code_mp].
142+
143+
### Hygiene
144+
145+
TODO
146+
147+
### Procedural Macros
148+
149+
TODO
150+
151+
### Custom Derive
152+
153+
TODO
154+
155+
156+
157+
[code_dir]: https://github.com/rust-lang/rust/tree/master/src/libsyntax/ext/tt
158+
[code_mp]: https://github.com/rust-lang/rust/tree/master/src/libsyntax/ext/tt/macro_parser.rs
159+
[code_mp]: https://github.com/rust-lang/rust/tree/master/src/libsyntax/ext/tt/macro_rules.rs
160+
[code_parse_int]: https://github.com/rust-lang/rust/blob/a97cd17f5d71fb4ec362f4fbd79373a6e7ed7b82/src/libsyntax/ext/tt/macro_parser.rs#L421
161+
[parsing]: ./the-parser.md

0 commit comments

Comments
 (0)
Please sign in to comment.