skip inferring calls that lead to `throw` #35982

JeffBezanson · 2020-05-22T01:19:59Z

This adds a simple linear pass that finds all IR statements obviously followed by a call to throw, and then skips inferring generic calls in those slots. Some numbers:

test	master	PR	PR (fixed)
Base	38.4	35.4	35.8
Stdlibs	53.8	52.8	53.3
precompile	148.3	145.9	146.3
sys.so	149047528	143333400	145332832
TTFP	13.4	12.7	12.8
plot after SIMD	10.9	9.9	10.1
plot after DataFrames	13.6	11.5	11.9

vchuravy · 2020-05-22T01:43:26Z

cc: @maleadt for possible GPU consequences.

Is this only triggered if one of the types in the signature is not concrete? E.g. if we would have add a apply_generic anyway?

JeffBezanson · 2020-05-22T02:54:38Z

No, it skips them unconditionally. I can try that, but I suspect anything less will not give much benefit (since we need to remove backedges to printing code). This is somewhat experimental, because we haven't done many things like this before (flagrantly not inferring some code), but I believe it will be necessary to make a dent in the latency situation.

The best way forward for GPUs etc. might be to run inference with this change disabled. @Keno et. al. have been working on the compiler refactoring needed for that.

JeffBezanson · 2020-05-22T03:11:29Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

nanosoldier · 2020-05-22T10:42:59Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

JeffBezanson · 2020-05-22T20:10:08Z

Ah, of course, the un-inferred callsites in many cases were blocking inlining. I decided to give those statements a flat cost. Added a column to the table above for the new numbers.

JeffBezanson · 2020-05-23T04:00:50Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

timholy · 2020-05-23T08:26:04Z

This looks like it would replace #30222, which I never got around to finishing. I think a fixed cost of 20 is pretty reasonable; it will still affect the inlining decision to some extent, but it will put an end to the current disincentive for crafting a careful error message that reports helpful state information back to the user. That's a win even without the very nice improvements in latency and system image size.

#30222 looks like it had a couple of other changes to the cost model; these may have long since been rendered unnecessary, but it might be worth a glance before you close it.

nanosoldier · 2020-05-23T11:36:02Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

JeffBezanson · 2020-05-23T19:35:52Z

Thanks for reminding me about that PR. You were right that this is really more of a reverse-flow analysis; I could use that to add some extra precision for branches. Adding more accurate costs for certain known C functions is also a good idea.

timholy · 2020-05-23T19:55:55Z

Feel free to steal wantonly from that PR, I got stymied and didn't know what to do.

JeffBezanson · 2020-06-10T03:38:00Z

Ok I think this is in good shape now. The analysis should definitely be scrutinized first; it's simple but subtle.

KristofferC · 2020-06-10T06:11:34Z

base/compiler/optimize.jl

@@ -319,7 +319,7 @@ function statement_cost(ex::Expr, line::Int, src::CodeInfo, sptypes::Vector{Any}
                return 0
            elseif (f === Main.Core.arrayref || f === Main.Core.const_arrayref) && length(ex.args) >= 3
                atyp = argextype(ex.args[3], src, sptypes, slottypes)
-                return isknowntype(atyp) ? 4 : params.inline_nonleaf_penalty
+                return isknowntype(atyp) ? 4 : error_path ? 20 : params.inline_nonleaf_penalty


Maybe this 20 could be a params.error_path_penalty?

timholy

Looks great!

base/compiler/utilities.jl

JeffBezanson · 2020-06-13T02:32:06Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

nanosoldier · 2020-06-13T10:02:42Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

JeffBezanson · 2020-06-13T22:05:09Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

nanosoldier · 2020-06-14T05:47:30Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

The deoptimization can sometimes destroy the effects analysis and disable [semi-]concrete evaluation that is otherwise possible. This is because the deoptimization was designed with the type domain profitability in mind (#35982), and it has not been aware of the effects domain very well. This commit makes the deoptimization aware of the effects domain more and enables the `throw` block deoptimization only when the effects already known to be ineligible for concrete-evaluation. In our current effect system, `ALWAYS_FALSE`/`false` means that the effect can not be refined to `ALWAYS_TRUE`/`true` anymore (unless given user annotation later). Therefore we can enable the `throw` block deoptimization without hindering the chance of concrete-evaluation when any of the following conditions are met: - `effects.consistent === ALWAYS_FALSE` - `effects.effect_free === ALWAYS_FALSE` - `effects.terminates` - `effects.nonoverlayed` ``` Here are some numbers: | Metric | master | this commit | #35982 reverted (set `unoptimize_throw_blocks=false`) | |-------------------------|-----------|-------------|--------------------------------------------| | Base (seconds) | 15.579300 | 15.206645 | 15.296319 | | Stdlibs (seconds) | 17.919013 | 17.667094 | 17.738128 | | Total (seconds) | 33.499279 | 32.874737 | 33.035448 | | Precompilation (seconds) | 49.967516 | 49.421121 | 49.999998 | | First time `plot(rand(10,3))` [^1] | `2.476678 seconds (11.74 M allocations)` | `2.430355 seconds (11.77 M allocations)` | `2.514874 seconds (11.64 M allocations)` | [^1]: I got these numbers with disabling all the `@precompile_all_calls` statements in Plots.jl. These numbers made me question if we are getting any actual benefit from the `throw` block deoptimization anymore. Since it is sometimes harmful for the effects analysis, we probably want to either merge this commit or remove the `throw` block deoptimization completely.

…49235) The deoptimization can sometimes destroy the effects analysis and disable [semi-]concrete evaluation that is otherwise possible. This is because the deoptimization was designed with the type domain profitability in mind (#35982), and hasn't been adequately considering the effects domain. This commit makes the deoptimization aware of the effects domain more and enables the `throw` block deoptimization only when the effects already known to be ineligible for concrete-evaluation. In our current effect system, `ALWAYS_FALSE`/`false` means that the effect can not be refined to `ALWAYS_TRUE`/`true` anymore (unless given user annotation later). Therefore we can enable the `throw` block deoptimization without hindering the chance of concrete-evaluation when any of the following conditions are met: - `effects.consistent === ALWAYS_FALSE` - `effects.effect_free === ALWAYS_FALSE` - `effects.terminates === false` - `effects.nonoverlayed === false` Here are some numbers: | Metric | master | this commit | #35982 reverted (set `unoptimize_throw_blocks=false`) | |-------------------------|-----------|-------------|--------------------------------------------| | Base (seconds) | 15.579300 | 15.206645 | 15.296319 | | Stdlibs (seconds) | 17.919013 | 17.667094 | 17.738128 | | Total (seconds) | 33.499279 | 32.874737 | 33.035448 | | Precompilation (seconds) | 49.967516 | 49.421121 | 49.999998 | | First time `plot(rand(10,3))` [^1] | `2.476678 seconds (11.74 M allocations)` | `2.430355 seconds (11.77 M allocations)` | `2.514874 seconds (11.64 M allocations)` | | First time `solve(prob, QNDF())(5.0)` [^2] | `4.469492 seconds (15.32 M allocations)` | `4.499217 seconds (15.41 M allocations)` | `4.470772 seconds (15.38 M allocations)` | [^1]: With disabling precompilation of Plots.jl. [^2]: With disabling precompilation of OrdinaryDiffEq. These numbers made me question if we are getting any actual benefit from the `throw` block deoptimization anymore. Since it is sometimes harmful for the effects analysis, we probably want to either merge this commit or remove the `throw` block deoptimization completely.

JeffBezanson added the latency Latency label May 22, 2020

JeffBezanson force-pushed the jb/throwinfer branch from 3e71a61 to fa15e1e Compare May 22, 2020 20:07

JeffBezanson force-pushed the jb/throwinfer branch from fa15e1e to 91c3dc6 Compare June 10, 2020 03:36

KristofferC reviewed Jun 10, 2020

View reviewed changes

JeffBezanson force-pushed the jb/throwinfer branch from 91c3dc6 to bdbad1b Compare June 10, 2020 19:01

timholy reviewed Jun 11, 2020

View reviewed changes

base/compiler/utilities.jl Show resolved Hide resolved

base/compiler/utilities.jl Show resolved Hide resolved

JeffBezanson force-pushed the jb/throwinfer branch 2 times, most recently from fbcbaec to 2c33c18 Compare June 12, 2020 20:44

JeffBezanson force-pushed the jb/throwinfer branch from 2c33c18 to 29aa6e1 Compare June 13, 2020 22:04

skip inferring calls that lead to throw

701e7bb

JeffBezanson force-pushed the jb/throwinfer branch from 29aa6e1 to 701e7bb Compare July 7, 2020 22:36

vtjnash approved these changes Jul 10, 2020

View reviewed changes

JeffBezanson merged commit 8320fcc into master Jul 10, 2020

JeffBezanson deleted the jb/throwinfer branch July 10, 2020 23:12

simeonschaub pushed a commit to simeonschaub/julia that referenced this pull request Aug 11, 2020

skip inferring calls that lead to throw (JuliaLang#35982)

d1b2745

maleadt mentioned this pull request Aug 26, 2020

Parameterize optimization that skips inference on throw blocks. #37211

Merged

timholy mentioned this pull request Sep 1, 2020

Add a statement-cost printer for analyzing inlining decisions #37275

Merged

JeffBezanson mentioned this pull request Sep 14, 2020

performance gain when kernel operation is wrapped with a function #37558

Closed

vchuravy mentioned this pull request Jan 31, 2021

Extra allocations associated with unused ArgumentError #37639

Closed

KristofferC mentioned this pull request Mar 30, 2021

Multiarg eachindex slower on 1.6 #40267

Open

KristofferC mentioned this pull request Jun 13, 2021

Type inference for Exception #41211

Closed

KristofferC mentioned this pull request Nov 16, 2022

Enable inference on throw blocks #47595

Closed

aviatesk mentioned this pull request Apr 3, 2023

inference: make throw block deoptimization concrete-eval friendly #49235

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skip inferring calls that lead to `throw` #35982

skip inferring calls that lead to `throw` #35982

JeffBezanson commented May 22, 2020 •

edited

Loading

vchuravy commented May 22, 2020

JeffBezanson commented May 22, 2020

JeffBezanson commented May 22, 2020

nanosoldier commented May 22, 2020

JeffBezanson commented May 22, 2020

JeffBezanson commented May 23, 2020

timholy commented May 23, 2020

nanosoldier commented May 23, 2020

JeffBezanson commented May 23, 2020

timholy commented May 23, 2020

JeffBezanson commented Jun 10, 2020

KristofferC Jun 10, 2020

timholy left a comment

JeffBezanson commented Jun 13, 2020

nanosoldier commented Jun 13, 2020

JeffBezanson commented Jun 13, 2020

nanosoldier commented Jun 14, 2020

skip inferring calls that lead to throw #35982

skip inferring calls that lead to throw #35982

Conversation

JeffBezanson commented May 22, 2020 • edited Loading

vchuravy commented May 22, 2020

JeffBezanson commented May 22, 2020

JeffBezanson commented May 22, 2020

nanosoldier commented May 22, 2020

JeffBezanson commented May 22, 2020

JeffBezanson commented May 23, 2020

timholy commented May 23, 2020

nanosoldier commented May 23, 2020

JeffBezanson commented May 23, 2020

timholy commented May 23, 2020

JeffBezanson commented Jun 10, 2020

KristofferC Jun 10, 2020

Choose a reason for hiding this comment

timholy left a comment

Choose a reason for hiding this comment

JeffBezanson commented Jun 13, 2020

nanosoldier commented Jun 13, 2020

JeffBezanson commented Jun 13, 2020

nanosoldier commented Jun 14, 2020

skip inferring calls that lead to `throw` #35982

skip inferring calls that lead to `throw` #35982

JeffBezanson commented May 22, 2020 •

edited

Loading