Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 0fcac9c

Browse files
NHDalyvchuravy
authored and
RAI CI (GitHub Action Automation)
committedMay 26, 2024
Allocation Profiler: Types for all allocations (JuliaLang#50337)
Pass the types to the allocator functions. ------- Before this PR, we were missing the types for allocations in two cases: 1. allocations from codegen 2. allocations in `gc_managed_realloc_` The second one is easy: those are always used for buffers, right? For the first one: we extend the allocation functions called from codegen, to take the type as a parameter, and set the tag there. I kept the old interfaces around, since I think that they cannot be removed due to supporting legacy code? ------ An example of the generated code: ```julia %ptls_field6 = getelementptr inbounds {}**, {}*** %4, i64 2 %13 = bitcast {}*** %ptls_field6 to i8** %ptls_load78 = load i8*, i8** %13, align 8 %box = call noalias nonnull dereferenceable(32) {}* @ijl_gc_pool_alloc_typed(i8* %ptls_load78, i32 1184, i32 32, i64 4366152144) #7 ``` Fixes JuliaLang#43688. Fixes JuliaLang#45268. Co-authored-by: Valentin Churavy <[email protected]>
1 parent aeaaaae commit 0fcac9c

12 files changed

+199
-59
lines changed
 

‎doc/src/manual/profile.md

+83-7
Original file line numberDiff line numberDiff line change
@@ -338,15 +338,91 @@ argument can be passed to speed it up by making it skip some allocations.
338338
Passing `sample_rate=1.0` will make it record everything (which is slow);
339339
`sample_rate=0.1` will record only 10% of the allocations (faster), etc.
340340

341-
!!! note
341+
!!! compat "Julia 1.11"
342+
343+
Older versions of Julia could not capture types in all cases. In older versions of
344+
Julia, if you see an allocation of type `Profile.Allocs.UnknownType`, it means that
345+
the profiler doesn't know what type of object was allocated. This mainly happened when
346+
the allocation was coming from generated code produced by the compiler. See
347+
[issue #43688](https://github.com/JuliaLang/julia/issues/43688) for more info.
348+
349+
Since Julia 1.11, all allocations should have a type reported.
350+
351+
For more details on how to use this tool, please see the following talk from JuliaCon 2022:
352+
https://www.youtube.com/watch?v=BFvpwC8hEWQ
353+
354+
##### Allocation Profiler Example
342355

343-
The current implementation of the Allocations Profiler _does not
344-
capture types for all allocations._ Allocations for which the profiler
345-
could not capture the type are represented as having type
346-
`Profile.Allocs.UnknownType`.
356+
In this simple example, we use PProf to visualize the alloc profile. You could use another
357+
visualization tool instead. We collect the profile (specifying a sample rate), then we visualize it.
358+
```julia
359+
using Profile, PProf
360+
Profile.Allocs.clear()
361+
Profile.Allocs.@profile sample_rate=0.0001 my_function()
362+
PProf.Allocs.pprof()
363+
```
364+
365+
Here is a more in-depth example, showing how we can tune the sample rate. A
366+
good number of samples to aim for is around 1 - 10 thousand. Too many, and the
367+
profile visualizer can get overwhelmed, and profiling will be slow. Too few,
368+
and you don't have a representative sample.
347369

348-
You can read more about the missing types and the plan to improve this, here:
349-
[issue #43688](https://github.com/JuliaLang/julia/issues/43688).
370+
371+
```julia-repl
372+
julia> import Profile
373+
374+
julia> @time my_function() # Estimate allocations from a (second-run) of the function
375+
0.110018 seconds (1.50 M allocations: 58.725 MiB, 17.17% gc time)
376+
500000
377+
378+
julia> Profile.Allocs.clear()
379+
380+
julia> Profile.Allocs.@profile sample_rate=0.001 begin # 1.5 M * 0.001 = ~1.5K allocs.
381+
my_function()
382+
end
383+
500000
384+
385+
julia> prof = Profile.Allocs.fetch(); # If you want, you can also manually inspect the results.
386+
387+
julia> length(prof.allocs) # Confirm we have expected number of allocations.
388+
1515
389+
390+
julia> using PProf # Now, visualize with an external tool, like PProf or ProfileCanvas.
391+
392+
julia> PProf.Allocs.pprof(prof; from_c=false) # You can optionally pass in a previously fetched profile result.
393+
Analyzing 1515 allocation samples... 100%|████████████████████████████████| Time: 0:00:00
394+
Main binary filename not available.
395+
Serving web UI on http://localhost:62261
396+
"alloc-profile.pb.gz"
397+
```
398+
Then you can view the profile by navigating to http://localhost:62261, and the profile is saved to disk.
399+
See PProf package for more options.
400+
401+
##### Allocation Profiling Tips
402+
403+
As stated above, aim for around 1-10 thousand samples in your profile.
404+
405+
Note that we are uniformly sampling in the space of _all allocations_, and are not weighting
406+
our samples by the size of the allocation. So a given allocation profile may not give a
407+
representative profile of where most bytes are allocated in your program, unless you had set
408+
`sample_rate=1`.
409+
410+
Allocations can come from users directly constructing objects, but can also come from inside
411+
the runtime or be inserted into compiled code to handle type instability. Looking at the
412+
"source code" view can be helpful to isolate them, and then other external tools such as
413+
[`Cthulhu.jl`](https://github.com/JuliaDebug/Cthulhu.jl) can be useful for identifying the
414+
cause of the allocation.
415+
416+
##### Allocation Profile Visualization Tools
417+
418+
There are several profiling visualization tools now that can all display Allocation
419+
Profiles. Here is a small list of some of the main ones we know about:
420+
- [PProf.jl](https://github.com/JuliaPerf/PProf.jl)
421+
- [ProfileCanvas.jl](https://github.com/pfitzseb/ProfileCanvas.jl)
422+
- VSCode's built-in profile visualizer (`@profview_allocs`) [docs needed]
423+
- Viewing the results directly in the REPL
424+
- You can inspect the results in the REPL via [`Profile.Allocs.fetch()`](@ref), to view
425+
the stacktrace and type of each allocation.
350426

351427
#### Line-by-Line Allocation Tracking
352428

‎src/gc-alloc-profiler.h

+1
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ void _maybe_record_alloc_to_profile(jl_value_t *val, size_t size, jl_datatype_t
3535

3636
extern int g_alloc_profile_enabled;
3737

38+
// This should only be used from _deprecated_ code paths. We shouldn't see UNKNOWN anymore.
3839
#define jl_gc_unknown_type_tag ((jl_datatype_t*)0xdeadaa03)
3940

4041
static inline void maybe_record_alloc_to_profile(jl_value_t *val, size_t size, jl_datatype_t *typ) JL_NOTSAFEPOINT {

‎src/gc.c

+17-2
Original file line numberDiff line numberDiff line change
@@ -1030,13 +1030,20 @@ STATIC_INLINE jl_value_t *jl_gc_big_alloc_inner(jl_ptls_t ptls, size_t sz)
10301030
return jl_valueof(&v->header);
10311031
}
10321032

1033-
// Instrumented version of jl_gc_big_alloc_inner, called into by LLVM-generated code.
1033+
// Deprecated version, supported for legacy code.
10341034
JL_DLLEXPORT jl_value_t *jl_gc_big_alloc(jl_ptls_t ptls, size_t sz)
10351035
{
10361036
jl_value_t *val = jl_gc_big_alloc_inner(ptls, sz);
10371037
maybe_record_alloc_to_profile(val, sz, jl_gc_unknown_type_tag);
10381038
return val;
10391039
}
1040+
// Instrumented version of jl_gc_big_alloc_inner, called into by LLVM-generated code.
1041+
JL_DLLEXPORT jl_value_t *jl_gc_big_alloc_instrumented(jl_ptls_t ptls, size_t sz, jl_value_t *type)
1042+
{
1043+
jl_value_t *val = jl_gc_big_alloc_inner(ptls, sz);
1044+
maybe_record_alloc_to_profile(val, sz, (jl_datatype_t*)type);
1045+
return val;
1046+
}
10401047

10411048
// This wrapper exists only to prevent `jl_gc_big_alloc_inner` from being inlined into
10421049
// its callers. We provide an external-facing interface for callers, and inline `jl_gc_big_alloc_inner`
@@ -1341,14 +1348,22 @@ STATIC_INLINE jl_value_t *jl_gc_pool_alloc_inner(jl_ptls_t ptls, int pool_offset
13411348
return jl_valueof(v);
13421349
}
13431350

1344-
// Instrumented version of jl_gc_pool_alloc_inner, called into by LLVM-generated code.
1351+
// Deprecated version, supported for legacy code.
13451352
JL_DLLEXPORT jl_value_t *jl_gc_pool_alloc(jl_ptls_t ptls, int pool_offset,
13461353
int osize)
13471354
{
13481355
jl_value_t *val = jl_gc_pool_alloc_inner(ptls, pool_offset, osize);
13491356
maybe_record_alloc_to_profile(val, osize, jl_gc_unknown_type_tag);
13501357
return val;
13511358
}
1359+
// Instrumented version of jl_gc_pool_alloc_inner, called into by LLVM-generated code.
1360+
JL_DLLEXPORT jl_value_t *jl_gc_pool_alloc_instrumented(jl_ptls_t ptls, int pool_offset,
1361+
int osize, jl_value_t* type)
1362+
{
1363+
jl_value_t *val = jl_gc_pool_alloc_inner(ptls, pool_offset, osize);
1364+
maybe_record_alloc_to_profile(val, osize, (jl_datatype_t*)type);
1365+
return val;
1366+
}
13521367

13531368
// This wrapper exists only to prevent `jl_gc_pool_alloc_inner` from being inlined into
13541369
// its callers. We provide an external-facing interface for callers, and inline `jl_gc_pool_alloc_inner`

‎src/jl_exported_funcs.inc

+2
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,7 @@
158158
XX(jl_gc_alloc_3w) \
159159
XX(jl_gc_alloc_typed) \
160160
XX(jl_gc_big_alloc) \
161+
XX(jl_gc_big_alloc_instrumented) \
161162
XX(jl_gc_collect) \
162163
XX(jl_gc_conservative_gc_support_enabled) \
163164
XX(jl_gc_counted_calloc) \
@@ -185,6 +186,7 @@
185186
XX(jl_gc_new_weakref_th) \
186187
XX(jl_gc_num) \
187188
XX(jl_gc_pool_alloc) \
189+
XX(jl_gc_pool_alloc_instrumented) \
188190
XX(jl_gc_queue_multiroot) \
189191
XX(jl_gc_queue_root) \
190192
XX(jl_gc_safepoint) \

‎src/llvm-final-gc-lowering.cpp

+5-4
Original file line numberDiff line numberDiff line change
@@ -187,12 +187,13 @@ Value *FinalLowerGC::lowerSafepoint(CallInst *target, Function &F)
187187
Value *FinalLowerGC::lowerGCAllocBytes(CallInst *target, Function &F)
188188
{
189189
++GCAllocBytesCount;
190-
assert(target->arg_size() == 2);
190+
assert(target->arg_size() == 3);
191191
CallInst *newI;
192192

193193
IRBuilder<> builder(target);
194194
builder.SetCurrentDebugLocation(target->getDebugLoc());
195195
auto ptls = target->getArgOperand(0);
196+
auto type = target->getArgOperand(2);
196197
Attribute derefAttr;
197198

198199
if (auto CI = dyn_cast<ConstantInt>(target->getArgOperand(1))) {
@@ -203,19 +204,19 @@ Value *FinalLowerGC::lowerGCAllocBytes(CallInst *target, Function &F)
203204
if (offset < 0) {
204205
newI = builder.CreateCall(
205206
bigAllocFunc,
206-
{ ptls, ConstantInt::get(T_size, sz + sizeof(void*)) });
207+
{ ptls, ConstantInt::get(T_size, sz + sizeof(void*)), type });
207208
derefAttr = Attribute::getWithDereferenceableBytes(F.getContext(), sz + sizeof(void*));
208209
}
209210
else {
210211
auto pool_offs = ConstantInt::get(Type::getInt32Ty(F.getContext()), offset);
211212
auto pool_osize = ConstantInt::get(Type::getInt32Ty(F.getContext()), osize);
212-
newI = builder.CreateCall(poolAllocFunc, { ptls, pool_offs, pool_osize });
213+
newI = builder.CreateCall(poolAllocFunc, { ptls, pool_offs, pool_osize, type });
213214
derefAttr = Attribute::getWithDereferenceableBytes(F.getContext(), osize);
214215
}
215216
} else {
216217
auto size = builder.CreateZExtOrTrunc(target->getArgOperand(1), T_size);
217218
size = builder.CreateAdd(size, ConstantInt::get(T_size, sizeof(void*)));
218-
newI = builder.CreateCall(allocTypedFunc, { ptls, size, ConstantPointerNull::get(Type::getInt8PtrTy(F.getContext())) });
219+
newI = builder.CreateCall(allocTypedFunc, { ptls, size, type });
219220
derefAttr = Attribute::getWithDereferenceableBytes(F.getContext(), sizeof(void*));
220221
}
221222
newI->setAttributes(newI->getCalledFunction()->getAttributes());

‎src/llvm-late-gc-lowering.cpp

+29-17
Original file line numberDiff line numberDiff line change
@@ -2348,22 +2348,6 @@ bool LateLowerGCFrame::CleanupIR(Function &F, State *S, bool *CFGModified) {
23482348
IRBuilder<> builder(CI);
23492349
builder.SetCurrentDebugLocation(CI->getDebugLoc());
23502350

2351-
// Create a call to the `julia.gc_alloc_bytes` intrinsic, which is like
2352-
// `julia.gc_alloc_obj` except it doesn't set the tag.
2353-
auto allocBytesIntrinsic = getOrDeclare(jl_intrinsics::GCAllocBytes);
2354-
auto ptlsLoad = get_current_ptls_from_task(builder, T_size, CI->getArgOperand(0), tbaa_gcframe);
2355-
auto ptls = builder.CreateBitCast(ptlsLoad, Type::getInt8PtrTy(builder.getContext()));
2356-
auto newI = builder.CreateCall(
2357-
allocBytesIntrinsic,
2358-
{
2359-
ptls,
2360-
builder.CreateIntCast(
2361-
CI->getArgOperand(1),
2362-
allocBytesIntrinsic->getFunctionType()->getParamType(1),
2363-
false)
2364-
});
2365-
newI->takeName(CI);
2366-
23672351
// LLVM alignment/bit check is not happy about addrspacecast and refuse
23682352
// to remove write barrier because of it.
23692353
// We pretty much only load using `T_size` so try our best to strip
@@ -2401,7 +2385,35 @@ bool LateLowerGCFrame::CleanupIR(Function &F, State *S, bool *CFGModified) {
24012385
builder.CreateAlignmentAssumption(DL, tag, 16);
24022386
}
24032387
}
2404-
// Set the tag.
2388+
2389+
// Create a call to the `julia.gc_alloc_bytes` intrinsic, which is like
2390+
// `julia.gc_alloc_obj` except it specializes the call based on the constant
2391+
// size of the object to allocate, to save one indirection, and doesn't set
2392+
// the type tag. (Note that if the size is not a constant, it will call
2393+
// gc_alloc_obj, and will redundantly set the tag.)
2394+
auto allocBytesIntrinsic = getOrDeclare(jl_intrinsics::GCAllocBytes);
2395+
auto ptlsLoad = get_current_ptls_from_task(builder, T_size, CI->getArgOperand(0), tbaa_gcframe);
2396+
auto ptls = builder.CreateBitCast(ptlsLoad, Type::getInt8PtrTy(builder.getContext()));
2397+
auto newI = builder.CreateCall(
2398+
allocBytesIntrinsic,
2399+
{
2400+
ptls,
2401+
builder.CreateIntCast(
2402+
CI->getArgOperand(1),
2403+
allocBytesIntrinsic->getFunctionType()->getParamType(1),
2404+
false),
2405+
builder.CreatePtrToInt(tag, T_size),
2406+
});
2407+
newI->takeName(CI);
2408+
2409+
// Now, finally, set the tag. We do this in IR instead of in the C alloc
2410+
// function, to provide possible optimization opportunities. (I think? TBH
2411+
// the most recent editor of this code is not entirely clear on why we
2412+
// prefer to set the tag in the generated code. Providing optimziation
2413+
// opportunities is the most likely reason; the tradeoff is slightly
2414+
// larger code size and increased compilation time, compiling this
2415+
// instruction at every allocation site, rather than once in the C alloc
2416+
// function.)
24052417
auto &M = *builder.GetInsertBlock()->getModule();
24062418
StoreInst *store = builder.CreateAlignedStore(
24072419
tag, EmitTagPtr(builder, tag_type, T_size, newI), M.getDataLayout().getPointerABIAlignment(0));

‎src/llvm-pass-helpers.cpp

+8-6
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,9 @@ namespace jl_intrinsics {
151151
auto intrinsic = Function::Create(
152152
FunctionType::get(
153153
T_prjlvalue,
154-
{ Type::getInt8PtrTy(ctx), T_size },
154+
{ Type::getInt8PtrTy(ctx),
155+
T_size,
156+
T_size }, // type
155157
false),
156158
Function::ExternalLinkage,
157159
GC_ALLOC_BYTES_NAME);
@@ -236,8 +238,8 @@ namespace jl_intrinsics {
236238
}
237239

238240
namespace jl_well_known {
239-
static const char *GC_BIG_ALLOC_NAME = XSTR(jl_gc_big_alloc);
240-
static const char *GC_POOL_ALLOC_NAME = XSTR(jl_gc_pool_alloc);
241+
static const char *GC_BIG_ALLOC_NAME = XSTR(jl_gc_big_alloc_instrumented);
242+
static const char *GC_POOL_ALLOC_NAME = XSTR(jl_gc_pool_alloc_instrumented);
241243
static const char *GC_QUEUE_ROOT_NAME = XSTR(jl_gc_queue_root);
242244
static const char *GC_ALLOC_TYPED_NAME = XSTR(jl_gc_alloc_typed);
243245

@@ -251,7 +253,7 @@ namespace jl_well_known {
251253
auto bigAllocFunc = Function::Create(
252254
FunctionType::get(
253255
T_prjlvalue,
254-
{ Type::getInt8PtrTy(ctx), T_size },
256+
{ Type::getInt8PtrTy(ctx), T_size , T_size},
255257
false),
256258
Function::ExternalLinkage,
257259
GC_BIG_ALLOC_NAME);
@@ -267,7 +269,7 @@ namespace jl_well_known {
267269
auto poolAllocFunc = Function::Create(
268270
FunctionType::get(
269271
T_prjlvalue,
270-
{ Type::getInt8PtrTy(ctx), Type::getInt32Ty(ctx), Type::getInt32Ty(ctx) },
272+
{ Type::getInt8PtrTy(ctx), Type::getInt32Ty(ctx), Type::getInt32Ty(ctx), T_size },
271273
false),
272274
Function::ExternalLinkage,
273275
GC_POOL_ALLOC_NAME);
@@ -301,7 +303,7 @@ namespace jl_well_known {
301303
T_prjlvalue,
302304
{ Type::getInt8PtrTy(ctx),
303305
T_size,
304-
Type::getInt8PtrTy(ctx) },
306+
T_size }, // type
305307
false),
306308
Function::ExternalLinkage,
307309
GC_ALLOC_TYPED_NAME);

‎stdlib/Profile/test/allocs.jl

+31
Original file line numberDiff line numberDiff line change
@@ -121,3 +121,34 @@ end
121121
@test length(prof.allocs) >= 1
122122
@test length([a for a in prof.allocs if a.type == String]) >= 1
123123
end
124+
125+
@testset "alloc profiler catches allocs from codegen" begin
126+
@eval begin
127+
struct MyType x::Int; y::Int end
128+
Base.:(+)(n::Number, x::MyType) = n + x.x + x.y
129+
foo(a, x) = a[1] + x
130+
wrapper(a) = foo(a, MyType(0,1))
131+
end
132+
a = Any[1,2,3]
133+
# warmup
134+
wrapper(a)
135+
136+
@eval Allocs.@profile sample_rate=1 wrapper($a)
137+
138+
prof = Allocs.fetch()
139+
Allocs.clear()
140+
141+
@test length(prof.allocs) >= 1
142+
@test length([a for a in prof.allocs if a.type == MyType]) >= 1
143+
end
144+
145+
@testset "alloc profiler catches allocs from buffer resize" begin
146+
a = Int[]
147+
Allocs.@profile sample_rate=1 for _ in 1:100; push!(a, 1); end
148+
149+
prof = Allocs.fetch()
150+
Allocs.clear()
151+
152+
@test length(prof.allocs) >= 1
153+
@test length([a for a in prof.allocs if a.type == Profile.Allocs.BufferType]) >= 1
154+
end

‎test/llvmpasses/alloc-opt-gcframe.ll

+8-8
Original file line numberDiff line numberDiff line change
@@ -14,17 +14,17 @@ target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
1414
; CHECK-NOT: @julia.gc_alloc_obj
1515

1616
; TYPED: %current_task = getelementptr inbounds {}*, {}** %gcstack, i64 -12
17-
; TYPED-NEXT: [[ptls_field:%.*]] = getelementptr inbounds {}*, {}** %current_task, i64 16
17+
; TYPED: [[ptls_field:%.*]] = getelementptr inbounds {}*, {}** %current_task, i64 16
1818
; TYPED-NEXT: [[ptls_load:%.*]] = load {}*, {}** [[ptls_field]], align 8, !tbaa !0
1919
; TYPED-NEXT: [[ppjl_ptls:%.*]] = bitcast {}* [[ptls_load]] to {}**
2020
; TYPED-NEXT: [[ptls_i8:%.*]] = bitcast {}** [[ppjl_ptls]] to i8*
21-
; TYPED-NEXT: %v = call noalias nonnull dereferenceable({{[0-9]+}}) {} addrspace(10)* @ijl_gc_pool_alloc(i8* [[ptls_i8]], i32 [[SIZE_T:[0-9]+]], i32 16)
21+
; TYPED-NEXT: %v = call noalias nonnull dereferenceable({{[0-9]+}}) {} addrspace(10)* @ijl_gc_pool_alloc_instrumented(i8* [[ptls_i8]], i32 [[SIZE_T:[0-9]+]], i32 16, i64 {{.*}} @tag {{.*}})
2222
; TYPED: store atomic {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* {{.*}} unordered, align 8, !tbaa !4
2323

2424
; OPAQUE: %current_task = getelementptr inbounds ptr, ptr %gcstack, i64 -12
25-
; OPAQUE-NEXT: [[ptls_field:%.*]] = getelementptr inbounds ptr, ptr %current_task, i64 16
25+
; OPAQUE: [[ptls_field:%.*]] = getelementptr inbounds ptr, ptr %current_task, i64 16
2626
; OPAQUE-NEXT: [[ptls_load:%.*]] = load ptr, ptr [[ptls_field]], align 8, !tbaa !0
27-
; OPAQUE-NEXT: %v = call noalias nonnull dereferenceable({{[0-9]+}}) ptr addrspace(10) @ijl_gc_pool_alloc(ptr [[ptls_load]], i32 [[SIZE_T:[0-9]+]], i32 16)
27+
; OPAQUE-NEXT: %v = call noalias nonnull dereferenceable({{[0-9]+}}) ptr addrspace(10) @ijl_gc_pool_alloc_instrumented(ptr [[ptls_load]], i32 [[SIZE_T:[0-9]+]], i32 16, i64 {{.*}} @tag {{.*}})
2828
; OPAQUE: store atomic ptr addrspace(10) @tag, ptr addrspace(10) {{.*}} unordered, align 8, !tbaa !4
2929

3030
define {} addrspace(10)* @return_obj() {
@@ -270,11 +270,11 @@ L3:
270270
}
271271
; CHECK-LABEL: }{{$}}
272272

273-
; TYPED: declare noalias nonnull {} addrspace(10)* @ijl_gc_pool_alloc(i8*,
274-
; TYPED: declare noalias nonnull {} addrspace(10)* @ijl_gc_big_alloc(i8*,
273+
; TYPED: declare noalias nonnull {} addrspace(10)* @ijl_gc_pool_alloc_instrumented(i8*,
274+
; TYPED: declare noalias nonnull {} addrspace(10)* @ijl_gc_big_alloc_instrumented(i8*,
275275

276-
; OPAQUE: declare noalias nonnull ptr addrspace(10) @ijl_gc_pool_alloc(ptr,
277-
; OPAQUE: declare noalias nonnull ptr addrspace(10) @ijl_gc_big_alloc(ptr,
276+
; OPAQUE: declare noalias nonnull ptr addrspace(10) @ijl_gc_pool_alloc_instrumented(ptr,
277+
; OPAQUE: declare noalias nonnull ptr addrspace(10) @ijl_gc_big_alloc_instrumented(ptr,
278278
declare void @external_function()
279279
declare {}*** @julia.get_pgcstack()
280280
declare noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}**, i64, {} addrspace(10)*)

‎test/llvmpasses/final-lower-gc.ll

+7-7
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ declare noalias nonnull {} addrspace(10)** @julia.new_gc_frame(i32)
1818
declare void @julia.push_gc_frame({} addrspace(10)**, i32)
1919
declare {} addrspace(10)** @julia.get_gc_frame_slot({} addrspace(10)**, i32)
2020
declare void @julia.pop_gc_frame({} addrspace(10)**)
21-
declare noalias nonnull {} addrspace(10)* @julia.gc_alloc_bytes(i8*, i64) #0
21+
declare noalias nonnull {} addrspace(10)* @julia.gc_alloc_bytes(i8*, i64, i64) #0
2222

2323
attributes #0 = { allocsize(1) }
2424

@@ -80,9 +80,9 @@ top:
8080
%pgcstack = call {}*** @julia.get_pgcstack()
8181
%ptls = call {}*** @julia.ptls_states()
8282
%ptls_i8 = bitcast {}*** %ptls to i8*
83-
; TYPED: %v = call noalias nonnull dereferenceable({{[0-9]+}}) {} addrspace(10)* @ijl_gc_pool_alloc
84-
; OPAQUE: %v = call noalias nonnull dereferenceable({{[0-9]+}}) ptr addrspace(10) @ijl_gc_pool_alloc
85-
%v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* %ptls_i8, i64 8)
83+
; TYPED: %v = call noalias nonnull dereferenceable({{[0-9]+}}) {} addrspace(10)* @ijl_gc_pool_alloc_instrumented
84+
; OPAQUE: %v = call noalias nonnull dereferenceable({{[0-9]+}}) ptr addrspace(10) @ijl_gc_pool_alloc_instrumented
85+
%v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* %ptls_i8, i64 8, i64 12341234)
8686
%0 = bitcast {} addrspace(10)* %v to {} addrspace(10)* addrspace(10)*
8787
%1 = getelementptr {} addrspace(10)*, {} addrspace(10)* addrspace(10)* %0, i64 -1
8888
store {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* %1, align 8, !tbaa !0
@@ -96,9 +96,9 @@ top:
9696
%ptls = call {}*** @julia.ptls_states()
9797
%ptls_i8 = bitcast {}*** %ptls to i8*
9898
; CHECK: %0 = add i64 %size, 8
99-
; TYPED: %v = call noalias nonnull dereferenceable(8) {} addrspace(10)* @ijl_gc_alloc_typed(i8* %ptls_i8, i64 %0, i8* null)
100-
; OPAQUE: %v = call noalias nonnull dereferenceable(8) ptr addrspace(10) @ijl_gc_alloc_typed(ptr %ptls_i8, i64 %0, ptr null)
101-
%v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* %ptls_i8, i64 %size)
99+
; TYPED: %v = call noalias nonnull dereferenceable(8) {} addrspace(10)* @ijl_gc_alloc_typed(i8* %ptls_i8, i64 %0, i64 12341234)
100+
; OPAQUE: %v = call noalias nonnull dereferenceable(8) ptr addrspace(10) @ijl_gc_alloc_typed(ptr %ptls_i8, i64 %0, i64 12341234)
101+
%v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* %ptls_i8, i64 %size, i64 12341234)
102102
%0 = bitcast {} addrspace(10)* %v to {} addrspace(10)* addrspace(10)*
103103
%1 = getelementptr {} addrspace(10)*, {} addrspace(10)* addrspace(10)* %0, i64 -1
104104
store {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* %1, align 8, !tbaa !0

‎test/llvmpasses/late-lower-gc-addrspaces.ll

+4-4
Original file line numberDiff line numberDiff line change
@@ -69,15 +69,15 @@ top:
6969
; TYPED-NEXT: [[ptls_load:%.*]] = load {}*, {}** [[ptls_field]], align 8, !tbaa !0
7070
; TYPED-NEXT: [[ppjl_ptls:%.*]] = bitcast {}* [[ptls_load]] to {}**
7171
; TYPED-NEXT: [[ptls_i8:%.*]] = bitcast {}** [[ppjl_ptls]] to i8*
72-
; TYPED-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8)
72+
; TYPED-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8, i64 {{.*}} @tag {{.*}})
7373
; TYPED-NEXT: [[V2:%.*]] = bitcast {} addrspace(10)* %v to {} addrspace(10)* addrspace(10)*
7474
; TYPED-NEXT: [[V_HEADROOM:%.*]] = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* [[V2]], i64 -1
7575
; TYPED-NEXT: store atomic {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* [[V_HEADROOM]] unordered, align 8, !tbaa !4
7676

7777
; OPAQUE: %current_task = getelementptr inbounds ptr, ptr %0, i64 -12
7878
; OPAQUE-NEXT: [[ptls_field:%.*]] = getelementptr inbounds ptr, ptr %current_task, i64 16
7979
; OPAQUE-NEXT: [[ptls_load:%.*]] = load ptr, ptr [[ptls_field]], align 8, !tbaa !0
80-
; OPAQUE-NEXT: %v = call ptr addrspace(10) @julia.gc_alloc_bytes(ptr [[ptls_load]], [[SIZE_T:i.[0-9]+]] 8)
80+
; OPAQUE-NEXT: %v = call ptr addrspace(10) @julia.gc_alloc_bytes(ptr [[ptls_load]], [[SIZE_T:i.[0-9]+]] 8, i64 {{.*}} @tag {{.*}})
8181
; OPAQUE-NEXT: [[V_HEADROOM:%.*]] = getelementptr inbounds ptr addrspace(10), ptr addrspace(10) %v, i64 -1
8282
; OPAQUE-NEXT: store atomic ptr addrspace(10) @tag, ptr addrspace(10) [[V_HEADROOM]] unordered, align 8, !tbaa !4
8383
%v = call noalias {} addrspace(10)* @julia.gc_alloc_obj({}** %current_task, i64 8, {} addrspace(10)* @tag)
@@ -102,15 +102,15 @@ top:
102102
; TYPED-NEXT: [[ptls_load:%.*]] = load {}*, {}** [[ptls_field]], align 8, !tbaa !0
103103
; TYPED-NEXT: [[ppjl_ptls:%.*]] = bitcast {}* [[ptls_load]] to {}**
104104
; TYPED-NEXT: [[ptls_i8:%.*]] = bitcast {}** [[ppjl_ptls]] to i8*
105-
; TYPED-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8)
105+
; TYPED-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8, i64 {{.*}} @tag {{.*}})
106106
; TYPED-NEXT: [[V2:%.*]] = bitcast {} addrspace(10)* %v to {} addrspace(10)* addrspace(10)*
107107
; TYPED-NEXT: [[V_HEADROOM:%.*]] = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* [[V2]], i64 -1
108108
; TYPED-NEXT: store atomic {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* [[V_HEADROOM]] unordered, align 8, !tbaa !4
109109

110110
; OPAQUE: %current_task = getelementptr inbounds ptr, ptr %0, i64 -12
111111
; OPAQUE-NEXT: [[ptls_field:%.*]] = getelementptr inbounds ptr, ptr %current_task, i64 16
112112
; OPAQUE-NEXT: [[ptls_load:%.*]] = load ptr, ptr [[ptls_field]], align 8, !tbaa !0
113-
; OPAQUE-NEXT: %v = call ptr addrspace(10) @julia.gc_alloc_bytes(ptr [[ptls_load]], [[SIZE_T:i.[0-9]+]] 8)
113+
; OPAQUE-NEXT: %v = call ptr addrspace(10) @julia.gc_alloc_bytes(ptr [[ptls_load]], [[SIZE_T:i.[0-9]+]] 8, i64 {{.*}} @tag {{.*}})
114114
; OPAQUE-NEXT: [[V_HEADROOM:%.*]] = getelementptr inbounds ptr addrspace(10), ptr addrspace(10) %v, i64 -1
115115
; OPAQUE-NEXT: store atomic ptr addrspace(10) @tag, ptr addrspace(10) [[V_HEADROOM]] unordered, align 8, !tbaa !4
116116
%v = call noalias {} addrspace(10)* @julia.gc_alloc_obj({}** %current_task, i64 8, {} addrspace(10)* @tag)

‎test/llvmpasses/late-lower-gc.ll

+4-4
Original file line numberDiff line numberDiff line change
@@ -66,15 +66,15 @@ top:
6666
; TYPED-NEXT: [[ptls_load:%.*]] = load {}*, {}** [[ptls_field]], align 8, !tbaa !0
6767
; TYPED-NEXT: [[ppjl_ptls:%.*]] = bitcast {}* [[ptls_load]] to {}**
6868
; TYPED-NEXT: [[ptls_i8:%.*]] = bitcast {}** [[ppjl_ptls]] to i8*
69-
; TYPED-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8)
69+
; TYPED-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8, i64 {{.*}} @tag {{.*}})
7070
; TYPED-NEXT: [[V2:%.*]] = bitcast {} addrspace(10)* %v to {} addrspace(10)* addrspace(10)*
7171
; TYPED-NEXT: [[V_HEADROOM:%.*]] = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* [[V2]], i64 -1
7272
; TYPED-NEXT: store atomic {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* [[V_HEADROOM]] unordered, align 8, !tbaa !4
7373

7474
; OPAQUE: %current_task = getelementptr inbounds ptr, ptr %0, i64 -12
7575
; OPAQUE-NEXT: [[ptls_field:%.*]] = getelementptr inbounds ptr, ptr %current_task, i64 16
7676
; OPAQUE-NEXT: [[ptls_load:%.*]] = load ptr, ptr [[ptls_field]], align 8, !tbaa !0
77-
; OPAQUE-NEXT: %v = call ptr addrspace(10) @julia.gc_alloc_bytes(ptr [[ptls_load]], [[SIZE_T:i.[0-9]+]] 8)
77+
; OPAQUE-NEXT: %v = call ptr addrspace(10) @julia.gc_alloc_bytes(ptr [[ptls_load]], [[SIZE_T:i.[0-9]+]] 8, i64 {{.*}} @tag {{.*}})
7878
; OPAQUE-NEXT: [[V_HEADROOM:%.*]] = getelementptr inbounds ptr addrspace(10), ptr addrspace(10) %v, i64 -1
7979
; OPAQUE-NEXT: store atomic ptr addrspace(10) @tag, ptr addrspace(10) [[V_HEADROOM]] unordered, align 8, !tbaa !4
8080
%v = call noalias {} addrspace(10)* @julia.gc_alloc_obj({}** %current_task, i64 8, {} addrspace(10)* @tag)
@@ -99,15 +99,15 @@ top:
9999
; TYPED-NEXT: [[ptls_load:%.*]] = load {}*, {}** [[ptls_field]], align 8, !tbaa !0
100100
; TYPED-NEXT: [[ppjl_ptls:%.*]] = bitcast {}* [[ptls_load]] to {}**
101101
; TYPED-NEXT: [[ptls_i8:%.*]] = bitcast {}** [[ppjl_ptls]] to i8*
102-
; TYPED-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8)
102+
; TYPED-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8, i64 {{.*}} @tag {{.*}})
103103
; TYPED-NEXT: [[V2:%.*]] = bitcast {} addrspace(10)* %v to {} addrspace(10)* addrspace(10)*
104104
; TYPED-NEXT: [[V_HEADROOM:%.*]] = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* [[V2]], i64 -1
105105
; TYPED-NEXT: store atomic {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* [[V_HEADROOM]] unordered, align 8, !tbaa !4
106106

107107
; OPAQUE: %current_task = getelementptr inbounds ptr, ptr %0, i64 -12
108108
; OPAQUE-NEXT: [[ptls_field:%.*]] = getelementptr inbounds ptr, ptr %current_task, i64 16
109109
; OPAQUE-NEXT: [[ptls_load:%.*]] = load ptr, ptr [[ptls_field]], align 8, !tbaa !0
110-
; OPAQUE-NEXT: %v = call ptr addrspace(10) @julia.gc_alloc_bytes(ptr [[ptls_load]], [[SIZE_T:i.[0-9]+]] 8)
110+
; OPAQUE-NEXT: %v = call ptr addrspace(10) @julia.gc_alloc_bytes(ptr [[ptls_load]], [[SIZE_T:i.[0-9]+]] 8, i64 {{.*}} @tag {{.*}})
111111
; OPAQUE-NEXT: [[V_HEADROOM:%.*]] = getelementptr inbounds ptr addrspace(10), ptr addrspace(10) %v, i64 -1
112112
; OPAQUE-NEXT: store atomic ptr addrspace(10) @tag, ptr addrspace(10) [[V_HEADROOM]] unordered, align 8, !tbaa !4
113113
%v = call noalias {} addrspace(10)* @julia.gc_alloc_obj({}** %current_task, i64 8, {} addrspace(10)* @tag)

0 commit comments

Comments
 (0)
Please sign in to comment.