Skip to content

Commit de94cda

Browse files
author
Quentin Colombet
committed
[LiveInterval] Allow updating subranges with slightly out-dated IR
During register coalescing, we update the live-intervals on-the-fly. To do that we are in this strange mode where the live-intervals can be slightly out-of-sync (more precisely they are forward looking) compared to what the IR actually represents. This happens because the register coalescer only updates the IR when it is done with updating the live-intervals and it has to do it this way because updating the IR on-the-fly would actually clobber some information on how the live-ranges that are being updated look like. This is problematic for updates that rely on the IR to accurately represents the state of the live-ranges. Right now, we have only one of those: stripValuesNotDefiningMask. To reconcile this need of out-of-sync IR, this patch introduces a new argument to LiveInterval::refineSubRanges that allows the code doing the live range updates to reason about how the code should look like after the coalescer will have rewritten the registers. Essentially this captures how a subregister index with be offseted to match its position in a new register class. E.g., let say we want to merge: V1.sub1:<2 x s32> = COPY V2.sub3:<4 x s32> We do that by choosing a class where sub1:<2 x s32> and sub3:<4 x s32> overlap, i.e., by choosing a class where we can find "offset + 1 == 3". Put differently we align V2's sub3 with V1's sub1: V2: sub0 sub1 sub2 sub3 V1: <offset> sub0 sub1 This offset will look like a composed subregidx in the the class: V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32> => V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32> Now if we didn't rewrite the uses and def of V1, all the checks for V1 need to account for this offset to match what the live intervals intend to capture. Prior to this patch, we would fail to recognize the uses and def of V1 and would end up with machine verifier errors: No live segment at def. This could lead to miscompile as we would drop some live-ranges and thus, miss some interferences. For this problem to trigger, we need to reach stripValuesNotDefiningMask while having a mismatch between the IR and the live-ranges (i.e., we have to apply a subreg offset to the IR.) This requires the following three conditions: 1. An update of overlapping subreg lanes: e.g., dsub0 == <ssub0, ssub1> 2. An update with Tuple registers with a possibility to coalesce the subreg index: e.g., v1.dsub_1 == v2.dsub_3 3. Subreg liveness enabled. looking at the IR to decide what is alive and what is not, i.e., calling stripValuesNotDefiningMask. coalescer maintains for the live-ranges information. None of the targets that currently use subreg liveness (i.e., the targets that fulfill #3, Hexagon, AMDGPU, PowerPC, and SystemZ IIRC) expose #1 and and #2, so this patch also artificial enables subreg liveness for ARM, so that a nice test case can be attached.
1 parent 2bf9b9a commit de94cda

File tree

6 files changed

+136
-11
lines changed

6 files changed

+136
-11
lines changed

llvm/include/llvm/CodeGen/LiveInterval.h

+26-1
Original file line numberDiff line numberDiff line change
@@ -836,10 +836,35 @@ namespace llvm {
836836
/// don't defne the related lane masks after they get shrunk. E.g.,
837837
/// when L000F gets split into L0007 and L0008 maybe only a subset
838838
/// of the VNIs that defined L000F defines L0007.
839+
///
840+
/// The clean up of the VNIs need to look at the actual instructions
841+
/// to decide what is or is not live at a definition point. If the
842+
/// update of the subranges occurs while the IR does not reflect these
843+
/// changes, \p ComposeSubRegIdx can be used to specify how the
844+
/// definition are going to be rewritten.
845+
/// E.g., let say we want to merge:
846+
/// V1.sub1:<2 x s32> = COPY V2.sub3:<4 x s32>
847+
/// We do that by choosing a class where sub1:<2 x s32> and sub3:<4 x s32>
848+
/// overlap, i.e., by choosing a class where we can find "offset + 1 == 3".
849+
/// Put differently we align V2's sub3 with V1's sub1:
850+
/// V2: sub0 sub1 sub2 sub3
851+
/// V1: <offset> sub0 sub1
852+
///
853+
/// This offset will look like a composed subregidx in the the class:
854+
/// V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>
855+
/// => V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>
856+
///
857+
/// Now if we didn't rewrite the uses and def of V1, all the checks for V1
858+
/// need to account for this offset.
859+
/// This happens during coalescing where we update the live-ranges while
860+
/// still having the old IR around because updating the IR on-the-fly
861+
/// would actually clobber some information on how the live-ranges that
862+
/// are being updated look like.
839863
void refineSubRanges(BumpPtrAllocator &Allocator, LaneBitmask LaneMask,
840864
std::function<void(LiveInterval::SubRange &)> Apply,
841865
const SlotIndexes &Indexes,
842-
const TargetRegisterInfo &TRI);
866+
const TargetRegisterInfo &TRI,
867+
unsigned ComposeSubRegIdx = 0);
843868

844869
bool operator<(const LiveInterval& other) const {
845870
const SlotIndex &thisIndex = beginIndex();

llvm/lib/CodeGen/LiveInterval.cpp

+14-5
Original file line numberDiff line numberDiff line change
@@ -883,7 +883,8 @@ void LiveInterval::clearSubRanges() {
883883
static void stripValuesNotDefiningMask(unsigned Reg, LiveInterval::SubRange &SR,
884884
LaneBitmask LaneMask,
885885
const SlotIndexes &Indexes,
886-
const TargetRegisterInfo &TRI) {
886+
const TargetRegisterInfo &TRI,
887+
unsigned ComposeSubRegIdx) {
887888
// Phys reg should not be tracked at subreg level.
888889
// Same for noreg (Reg == 0).
889890
if (!Register::isVirtualRegister(Reg) || !Reg)
@@ -905,7 +906,12 @@ static void stripValuesNotDefiningMask(unsigned Reg, LiveInterval::SubRange &SR,
905906
continue;
906907
if (MOI->getReg() != Reg)
907908
continue;
908-
if ((TRI.getSubRegIndexLaneMask(MOI->getSubReg()) & LaneMask).none())
909+
LaneBitmask OrigMask = TRI.getSubRegIndexLaneMask(MOI->getSubReg());
910+
LaneBitmask ExpectedDefMask =
911+
ComposeSubRegIdx
912+
? TRI.composeSubRegIndexLaneMask(ComposeSubRegIdx, OrigMask)
913+
: OrigMask;
914+
if ((ExpectedDefMask & LaneMask).none())
909915
continue;
910916
hasDef = true;
911917
break;
@@ -924,7 +930,8 @@ static void stripValuesNotDefiningMask(unsigned Reg, LiveInterval::SubRange &SR,
924930
void LiveInterval::refineSubRanges(
925931
BumpPtrAllocator &Allocator, LaneBitmask LaneMask,
926932
std::function<void(LiveInterval::SubRange &)> Apply,
927-
const SlotIndexes &Indexes, const TargetRegisterInfo &TRI) {
933+
const SlotIndexes &Indexes, const TargetRegisterInfo &TRI,
934+
unsigned ComposeSubRegIdx) {
928935
LaneBitmask ToApply = LaneMask;
929936
for (SubRange &SR : subranges()) {
930937
LaneBitmask SRMask = SR.LaneMask;
@@ -944,8 +951,10 @@ void LiveInterval::refineSubRanges(
944951
MatchingRange = createSubRangeFrom(Allocator, Matching, SR);
945952
// Now that the subrange is split in half, make sure we
946953
// only keep in the subranges the VNIs that touch the related half.
947-
stripValuesNotDefiningMask(reg, *MatchingRange, Matching, Indexes, TRI);
948-
stripValuesNotDefiningMask(reg, SR, SR.LaneMask, Indexes, TRI);
954+
stripValuesNotDefiningMask(reg, *MatchingRange, Matching, Indexes, TRI,
955+
ComposeSubRegIdx);
956+
stripValuesNotDefiningMask(reg, SR, SR.LaneMask, Indexes, TRI,
957+
ComposeSubRegIdx);
949958
}
950959
Apply(*MatchingRange);
951960
ToApply &= ~Matching;

llvm/lib/CodeGen/RegisterCoalescer.cpp

+7-5
Original file line numberDiff line numberDiff line change
@@ -225,7 +225,8 @@ namespace {
225225
/// @p ToMerge will occupy in the coalescer register. @p LI has its subrange
226226
/// lanemasks already adjusted to the coalesced register.
227227
void mergeSubRangeInto(LiveInterval &LI, const LiveRange &ToMerge,
228-
LaneBitmask LaneMask, CoalescerPair &CP);
228+
LaneBitmask LaneMask, CoalescerPair &CP,
229+
unsigned DstIdx);
229230

230231
/// Join the liveranges of two subregisters. Joins @p RRange into
231232
/// @p LRange, @p RRange may be invalid afterwards.
@@ -3271,7 +3272,8 @@ void RegisterCoalescer::joinSubRegRanges(LiveRange &LRange, LiveRange &RRange,
32713272
void RegisterCoalescer::mergeSubRangeInto(LiveInterval &LI,
32723273
const LiveRange &ToMerge,
32733274
LaneBitmask LaneMask,
3274-
CoalescerPair &CP) {
3275+
CoalescerPair &CP,
3276+
unsigned ComposeSubRegIdx) {
32753277
BumpPtrAllocator &Allocator = LIS->getVNInfoAllocator();
32763278
LI.refineSubRanges(
32773279
Allocator, LaneMask,
@@ -3284,7 +3286,7 @@ void RegisterCoalescer::mergeSubRangeInto(LiveInterval &LI,
32843286
joinSubRegRanges(SR, RangeCopy, SR.LaneMask, CP);
32853287
}
32863288
},
3287-
*LIS->getSlotIndexes(), *TRI);
3289+
*LIS->getSlotIndexes(), *TRI, ComposeSubRegIdx);
32883290
}
32893291

32903292
bool RegisterCoalescer::isHighCostLiveInterval(LiveInterval &LI) {
@@ -3350,12 +3352,12 @@ bool RegisterCoalescer::joinVirtRegs(CoalescerPair &CP) {
33503352
if (!RHS.hasSubRanges()) {
33513353
LaneBitmask Mask = SrcIdx == 0 ? CP.getNewRC()->getLaneMask()
33523354
: TRI->getSubRegIndexLaneMask(SrcIdx);
3353-
mergeSubRangeInto(LHS, RHS, Mask, CP);
3355+
mergeSubRangeInto(LHS, RHS, Mask, CP, DstIdx);
33543356
} else {
33553357
// Pair up subranges and merge.
33563358
for (LiveInterval::SubRange &R : RHS.subranges()) {
33573359
LaneBitmask Mask = TRI->composeSubRegIndexLaneMask(SrcIdx, R.LaneMask);
3358-
mergeSubRangeInto(LHS, R, Mask, CP);
3360+
mergeSubRangeInto(LHS, R, Mask, CP, DstIdx);
33593361
}
33603362
}
33613363
LLVM_DEBUG(dbgs() << "\tJoined SubRanges " << LHS << "\n");

llvm/lib/Target/ARM/ARMSubtarget.cpp

+5
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,9 @@ static cl::opt<bool>
7272
ForceFastISel("arm-force-fast-isel",
7373
cl::init(false), cl::Hidden);
7474

75+
static cl::opt<bool> EnableSubRegLiveness("arm-enable-subreg-liveness",
76+
cl::init(false), cl::Hidden);
77+
7578
/// initializeSubtargetDependencies - Initializes using a CPU and feature string
7679
/// so that we can use initializer lists for subtarget initialization.
7780
ARMSubtarget &ARMSubtarget::initializeSubtargetDependencies(StringRef CPU,
@@ -379,6 +382,8 @@ bool ARMSubtarget::enableMachineScheduler() const {
379382
return useMachineScheduler();
380383
}
381384

385+
bool ARMSubtarget::enableSubRegLiveness() const { return EnableSubRegLiveness; }
386+
382387
// This overrides the PostRAScheduler bit in the SchedModel for any CPU.
383388
bool ARMSubtarget::enablePostRAScheduler() const {
384389
if (enableMachineScheduler())

llvm/lib/Target/ARM/ARMSubtarget.h

+3
Original file line numberDiff line numberDiff line change
@@ -806,6 +806,9 @@ class ARMSubtarget : public ARMGenSubtargetInfo {
806806
/// True for some subtargets at > -O0.
807807
bool enablePostRAMachineScheduler() const override;
808808

809+
/// Check whether this subtarget wants to use subregister liveness.
810+
bool enableSubRegLiveness() const override;
811+
809812
/// Enable use of alias analysis during code generation (during MI
810813
/// scheduling, DAGCombine, etc.).
811814
bool useAA() const override { return true; }
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
2+
# RUN: llc %s -start-before simple-register-coalescing -mtriple=arm-apple-ios -stop-after machine-scheduler -o - -arm-enable-subreg-liveness -verify-machineinstrs | FileCheck %s
3+
4+
# Check that when we merge live-ranges that imply offseting
5+
# the definition of a subregister by some other subreg index,
6+
# we take that new index into account while updating the subrange.
7+
#
8+
# For this specific test case, the coalescer is going to get rid
9+
# of `%5.dsub_1:dtriple = COPY %4.dsub_3` by aligning
10+
# %5.dsub_1:<3 x s64> with %4.dsub_3:<4 x s64>.
11+
# This is done by moving to a bigger register class <5 x s64>
12+
# and offseting %5 definitions with a new subregidx:
13+
# NewVar: <5 x s64> dsub_0 dsub_1 dsub_2 dsub_3 dsub_4
14+
# %4: <4 x s64> dsub_0 dsub_1 dsub_2 dsub_3
15+
# %5: <3 x s64> <==offset===> dsub_0 dsub_1 dsub_2
16+
#
17+
# In other %5.dsub_0 needs to be mapped to NewVar.dsub_2, %5.dsub_1
18+
# to NewVar.dsub_3 and so on. So essentially we are offseting %5 by
19+
# dsub_2.
20+
#
21+
# When updating the live-ranges, the register coalescer actually
22+
# has not rewritten the original code, so we need to fake the
23+
# rewrite to do that update.
24+
# This used to be wrong and this test was failling with a machine
25+
# verifier error: No live segment at def.
26+
#
27+
# The test case runs through the coalescer *and* the scheduler, just
28+
# to force the live intervals to be carried around so that the verifier
29+
# gets a chance to verify those. If we were to just run the coalescer,
30+
# the live intervals would be dropped before running the verifier since
31+
# no other pass would need that analysis around.
32+
#
33+
# Note: The test case looks slightly more complicated than just the
34+
# offseting part. That's because the bug needs three things to
35+
# trigger:
36+
# 1. Overlapping subreg lanes: here, dsub0 == <ssub0, ssub1>
37+
# 2. Tuple registers with a possibility to coalesce the subreg index:
38+
# here, what we explain with %5.dsub_1 == %4.dsub_3
39+
# 3. Subreg liveness enabled.
40+
# #1 is required to trigger the splitting of subranges that implies
41+
# looking at the IR to decide what is alive and what is not.
42+
# #2 is what produces the IR to be out-of-synce with what the reg coalescer
43+
# maintains for the live-ranges information.
44+
# #3 is, well, the problem has to do with subranges updates!
45+
#
46+
# In the end, the expected result is to have all the variables
47+
# being coalesced in one big (qqqq) variable.
48+
---
49+
name: main
50+
alignment: 1
51+
tracksRegLiveness: true
52+
frameInfo:
53+
maxAlignment: 1
54+
machineFunctionInfo: {}
55+
body: |
56+
bb.0:
57+
liveins: $d2, $s1, $d4
58+
59+
60+
; CHECK-LABEL: name: main
61+
; CHECK: liveins: $d2, $s1, $d4
62+
; CHECK: undef %4.dsub_0:qqqqpr_with_ssub_4 = COPY $d4
63+
; CHECK: %4.ssub_4:qqqqpr_with_ssub_4 = COPY $s1
64+
; CHECK: %4.dsub_1:qqqqpr_with_ssub_4 = COPY $d2
65+
; CHECK: %4.dsub_3:qqqqpr_with_ssub_4 = COPY %4.dsub_1
66+
; CHECK: KILL implicit-def %4.dsub_2, implicit %4.qqsub_0
67+
; CHECK: %4.dsub_4:qqqqpr_with_ssub_4 = COPY %4.dsub_1
68+
; CHECK: tBX_RET 14, $noreg, implicit %4.ssub_4_ssub_5_ssub_6_ssub_7_ssub_8_ssub_9
69+
%3:dpr_vfp2 = COPY $d4
70+
undef %0.ssub_0:dpr_vfp2 = COPY $s1
71+
%1:dpr_vfp2 = COPY $d2
72+
undef %4.dsub_0:dquad = COPY %3
73+
%4.dsub_1:dquad = COPY %1
74+
%4.dsub_2:dquad = COPY %0
75+
%4.dsub_3:dquad = COPY %1
76+
KILL implicit-def undef %5.dsub_0:dtriple, implicit %4
77+
%5.dsub_1:dtriple = COPY %4.dsub_3
78+
%5.dsub_2:dtriple = COPY %1
79+
tBX_RET 14, $noreg, implicit %5
80+
81+
...

0 commit comments

Comments
 (0)