Skip to content

Commit 85f16fb

Browse files
authored
Rollup merge of #83329 - camelid:debuginfo-doc-cleanup, r=davidtwco
Cleanup LLVM debuginfo module docs - Move debuginfo docs from `doc.rs` module to `doc.md` file - Cleanup LLVM debuginfo module docs
2 parents 34285de + dc240fa commit 85f16fb

File tree

4 files changed

+182
-181
lines changed

4 files changed

+182
-181
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
# Debug Info Module
2+
3+
This module serves the purpose of generating debug symbols. We use LLVM's
4+
[source level debugging](https://llvm.org/docs/SourceLevelDebugging.html)
5+
features for generating the debug information. The general principle is
6+
this:
7+
8+
Given the right metadata in the LLVM IR, the LLVM code generator is able to
9+
create DWARF debug symbols for the given code. The
10+
[metadata](https://llvm.org/docs/LangRef.html#metadata-type) is structured
11+
much like DWARF *debugging information entries* (DIE), representing type
12+
information such as datatype layout, function signatures, block layout,
13+
variable location and scope information, etc. It is the purpose of this
14+
module to generate correct metadata and insert it into the LLVM IR.
15+
16+
As the exact format of metadata trees may change between different LLVM
17+
versions, we now use LLVM
18+
[DIBuilder](https://llvm.org/docs/doxygen/html/classllvm_1_1DIBuilder.html)
19+
to create metadata where possible. This will hopefully ease the adaption of
20+
this module to future LLVM versions.
21+
22+
The public API of the module is a set of functions that will insert the
23+
correct metadata into the LLVM IR when called with the right parameters.
24+
The module is thus driven from an outside client with functions like
25+
`debuginfo::create_local_var_metadata(bx: block, local: &ast::local)`.
26+
27+
Internally the module will try to reuse already created metadata by
28+
utilizing a cache. The way to get a shared metadata node when needed is
29+
thus to just call the corresponding function in this module:
30+
31+
let file_metadata = file_metadata(cx, file);
32+
33+
The function will take care of probing the cache for an existing node for
34+
that exact file path.
35+
36+
All private state used by the module is stored within either the
37+
CrateDebugContext struct (owned by the CodegenCx) or the
38+
FunctionDebugContext (owned by the FunctionCx).
39+
40+
This file consists of three conceptual sections:
41+
1. The public interface of the module
42+
2. Module-internal metadata creation functions
43+
3. Minor utility functions
44+
45+
46+
## Recursive Types
47+
48+
Some kinds of types, such as structs and enums can be recursive. That means
49+
that the type definition of some type X refers to some other type which in
50+
turn (transitively) refers to X. This introduces cycles into the type
51+
referral graph. A naive algorithm doing an on-demand, depth-first traversal
52+
of this graph when describing types, can get trapped in an endless loop
53+
when it reaches such a cycle.
54+
55+
For example, the following simple type for a singly-linked list...
56+
57+
```
58+
struct List {
59+
value: i32,
60+
tail: Option<Box<List>>,
61+
}
62+
```
63+
64+
will generate the following callstack with a naive DFS algorithm:
65+
66+
```
67+
describe(t = List)
68+
describe(t = i32)
69+
describe(t = Option<Box<List>>)
70+
describe(t = Box<List>)
71+
describe(t = List) // at the beginning again...
72+
...
73+
```
74+
75+
To break cycles like these, we use "forward declarations". That is, when
76+
the algorithm encounters a possibly recursive type (any struct or enum), it
77+
immediately creates a type description node and inserts it into the cache
78+
*before* describing the members of the type. This type description is just
79+
a stub (as type members are not described and added to it yet) but it
80+
allows the algorithm to already refer to the type. After the stub is
81+
inserted into the cache, the algorithm continues as before. If it now
82+
encounters a recursive reference, it will hit the cache and does not try to
83+
describe the type anew.
84+
85+
This behavior is encapsulated in the 'RecursiveTypeDescription' enum,
86+
which represents a kind of continuation, storing all state needed to
87+
continue traversal at the type members after the type has been registered
88+
with the cache. (This implementation approach might be a tad over-
89+
engineered and may change in the future)
90+
91+
92+
## Source Locations and Line Information
93+
94+
In addition to data type descriptions the debugging information must also
95+
allow to map machine code locations back to source code locations in order
96+
to be useful. This functionality is also handled in this module. The
97+
following functions allow to control source mappings:
98+
99+
+ `set_source_location()`
100+
+ `clear_source_location()`
101+
+ `start_emitting_source_locations()`
102+
103+
`set_source_location()` allows to set the current source location. All IR
104+
instructions created after a call to this function will be linked to the
105+
given source location, until another location is specified with
106+
`set_source_location()` or the source location is cleared with
107+
`clear_source_location()`. In the later case, subsequent IR instruction
108+
will not be linked to any source location. As you can see, this is a
109+
stateful API (mimicking the one in LLVM), so be careful with source
110+
locations set by previous calls. It's probably best to not rely on any
111+
specific state being present at a given point in code.
112+
113+
One topic that deserves some extra attention is *function prologues*. At
114+
the beginning of a function's machine code there are typically a few
115+
instructions for loading argument values into allocas and checking if
116+
there's enough stack space for the function to execute. This *prologue* is
117+
not visible in the source code and LLVM puts a special PROLOGUE END marker
118+
into the line table at the first non-prologue instruction of the function.
119+
In order to find out where the prologue ends, LLVM looks for the first
120+
instruction in the function body that is linked to a source location. So,
121+
when generating prologue instructions we have to make sure that we don't
122+
emit source location information until the 'real' function body begins. For
123+
this reason, source location emission is disabled by default for any new
124+
function being codegened and is only activated after a call to the third
125+
function from the list above, `start_emitting_source_locations()`. This
126+
function should be called right before regularly starting to codegen the
127+
top-level block of the given function.
128+
129+
There is one exception to the above rule: `llvm.dbg.declare` instruction
130+
must be linked to the source location of the variable being declared. For
131+
function parameters these `llvm.dbg.declare` instructions typically occur
132+
in the middle of the prologue, however, they are ignored by LLVM's prologue
133+
detection. The `create_argument_metadata()` and related functions take care
134+
of linking the `llvm.dbg.declare` instructions to the correct source
135+
locations even while source location emission is still disabled, so there
136+
is no need to do anything special with source location handling here.
137+
138+
## Unique Type Identification
139+
140+
In order for link-time optimization to work properly, LLVM needs a unique
141+
type identifier that tells it across compilation units which types are the
142+
same as others. This type identifier is created by
143+
`TypeMap::get_unique_type_id_of_type()` using the following algorithm:
144+
145+
1. Primitive types have their name as ID
146+
147+
2. Structs, enums and traits have a multipart identifier
148+
149+
1. The first part is the SVH (strict version hash) of the crate they
150+
were originally defined in
151+
152+
2. The second part is the ast::NodeId of the definition in their
153+
original crate
154+
155+
3. The final part is a concatenation of the type IDs of their concrete
156+
type arguments if they are generic types.
157+
158+
3. Tuple-, pointer-, and function types are structurally identified, which
159+
means that they are equivalent if their component types are equivalent
160+
(i.e., `(i32, i32)` is the same regardless in which crate it is used).
161+
162+
This algorithm also provides a stable ID for types that are defined in one
163+
crate but instantiated from metadata within another crate. We just have to
164+
take care to always map crate and `NodeId`s back to the original crate
165+
context.
166+
167+
As a side-effect these unique type IDs also help to solve a problem arising
168+
from lifetime parameters. Since lifetime parameters are completely omitted
169+
in debuginfo, more than one `Ty` instance may map to the same debuginfo
170+
type metadata, that is, some struct `Struct<'a>` may have N instantiations
171+
with different concrete substitutions for `'a`, and thus there will be N
172+
`Ty` instances for the type `Struct<'a>` even though it is not generic
173+
otherwise. Unfortunately this means that we cannot use `ty::type_id()` as
174+
cheap identifier for type metadata -- we have done this in the past, but it
175+
led to unnecessary metadata duplication in the best case and LLVM
176+
assertions in the worst. However, the unique type ID as described above
177+
*can* be used as identifier. Since it is comparatively expensive to
178+
construct, though, `ty::type_id()` is still used additionally as an
179+
optimization for cases where the exact same type has been seen before
180+
(which is most of the time).

compiler/rustc_codegen_llvm/src/debuginfo/doc.rs

-179
This file was deleted.

compiler/rustc_codegen_llvm/src/debuginfo/mod.rs

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
1-
// See doc.rs for documentation.
2-
mod doc;
1+
#![doc = include_str!("doc.md")]
32

43
use rustc_codegen_ssa::mir::debuginfo::VariableKind::*;
54

compiler/rustc_codegen_llvm/src/lib.rs

+1
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
#![feature(bool_to_option)]
99
#![feature(const_cstr_unchecked)]
1010
#![feature(crate_visibility_modifier)]
11+
#![feature(extended_key_value_attributes)]
1112
#![feature(extern_types)]
1213
#![feature(in_band_lifetimes)]
1314
#![feature(nll)]

0 commit comments

Comments
 (0)