-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: expand qmarks earlier #86778
JIT: expand qmarks earlier #86778
Conversation
Move qmark expansion to happen at the end of the importation phases. That way we don't have to model the messy semantics of qmarks in analysis and optimization phases. For now the downstream bits that dealt with qmarks are left intact. They will be removed later.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsMove qmark expansion to happen at the end of the importation phases. That way we don't have to model the messy semantics of qmarks in analysis and optimization phases. For now the downstream bits that dealt with qmarks are left intact. They will be removed later.
|
@jakobbotsch @EgorBo FYI. Local runs show quite mixed diffs. Looks like quite a few cases where we don't handle the expanded code as well as we could. Cleaning this up might turn this into a reliable code size improvement. Going to flight this in CI to see if there are any obvious correctness issues and to gauge TP impact. |
Not obvious to me yet why the number of missed contexts is so much higher. Some of the diffs happen because we're now expanding qmarks before forward sub so we often will pessimize code if the qmark result is single use and say controls a branch. But we don't want to expand qmarks after forward sub because forward sub now relies on liveness and we'd prefer not to have to model qmarks in liveness. So an alternate thought is to allow qmarks to be anywhere in trees and then split trees as part of qmark expansion if they don't end up at top level. Or perhaps a simple qmark only forward sub (but we would not have RCS Early counts either, so not clear how we'd know when this would be ok). Or perhaps defer expansion but produce a functional form for the qmark tree, then split when we want to expand. There may be some small as of yet unrealized TP win by cleaning up QMARK checks in morph/liveness, but I don't expect it to amount to a whole lot. But I suppose I should try it and see. |
Another source of regressions can be what we discussed in #76439... local assertion prop can be more precise when qmarks aren't expanded yet. Not sure how important this is, though. |
Wonder what happened here: ; System.Tests.Perf_Array:ArrayCreate1D():System.Array:this
@@ -31,6 +32,8 @@ G_M36762_IG03: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
jl SHORT G_M36762_IG06
mov rcx, 0xD1FFAB1E
; gcrRegs +[rcx]
+ test rcx, rcx
+ je SHORT G_M36762_IG07
lea r8, [rsp+20H]
mov edx, 1
xor r9d, r9d
@@ -38,7 +41,7 @@ G_M36762_IG03: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
; gcrRegs -[rcx] +[rax]
; gcr arg pop 0
nop
- ;; size=48 bbWeight=1 PerfScore 8.50
+ ;; size=53 bbWeight=1 PerfScore 9.75
G_M36762_IG04: ; bbWeight=1, epilog, nogc, extend
add rsp, 40
ret
@@ -56,8 +59,16 @@ G_M36762_IG06: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
; gcr arg pop 0
int3
;; size=7 bbWeight=0 PerfScore 0.00
+G_M36762_IG07: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+ mov ecx, 57
+ mov edx, 84
+ mov rax, 0xD1FFAB1E ; code for System.ThrowHelper:ThrowArgumentException(int,int)
+ call [rax]System.ThrowHelper:ThrowArgumentException(int,int)
+ ; gcr arg pop 0
+ int3
+ ;; size=23 bbWeight=0 PerfScore 0.00
-; Total bytes of code 95, prolog size 4, PerfScore 23.50, instruction count 21, allocated bytes for code 95 (MethodHash=b60d7065) for method System.Tests.Perf_Array:ArrayCreate1D():System.Array:this
+; Total bytes of code 123, prolog size 4, PerfScore 27.55, instruction count 28, allocated bytes for code 123 (MethodHash=b60d7065) for method System.Tests.Perf_Array:ArrayCreate1D():System.Array:this
; ============================================================ I assmue C# for this is: private static readonly int s_DIM_1 = 4096;
public Array ArrayCreate1D() => Array.CreateInstance(typeof(int), s_DIM_1); |
Looks like we end up with a side effect in a block we should be able to jump thread, and so RBO is less effective. |
Latest diffs. TP is looking good, but still some work to do on CQ issues. |
// Skip liveness updates/marking for defs; they may be conditionally executed. | ||
if ((cur->gtFlags & GTF_VAR_DEF) == 0) | ||
{ | ||
FillInLiveness(life, volatileVars, cur->AsLclVarCommon()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it was updated to the wrong side of the if, this was the code for when there is a QMARK.
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
Move qmark expansion to happen at the end of the importation phases. That way we don't have to model the messy semantics of qmarks in analysis and optimization phases.
For now the downstream bits that dealt with qmarks are left intact. They will be removed later.