-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wasm-opt -Oz
takes an inordinate amount of time
#7319
Comments
Ah, I could gzip it: |
Interesting... this is slow mainly on There is no single function with this slowdown, but many that each take a second or so. I looked at (local.set $9
(block (result (ref eq))
(local.set $scratch_1567
(ref.as_non_null
(table.get $2
(i32.add
(local.get $2)
(i32.const 5)
)
)
)
)
(local.set $10
(block (result (ref eq))
(local.set $scratch_1566
(ref.as_non_null
(table.get $2
(i32.add
(local.get $2)
(i32.const 6)
)
)
)
)
(local.set $11
(block (result (ref eq))
(local.set $scratch_1565
(ref.as_non_null
(table.get $2
(i32.add
(local.get $2)
(i32.const 7)
)
)
)
)
[...]
)
(local.get $scratch_1565)
)
)
(local.get $scratch_1566)
)
)
(local.get $scratch_1567)
)
So it is doing these table gets, casting them, and setting them in scratch locals, as the code gets more and more nested. Then it sets the scratch locals to the originals. After that, there is some reasonable-looking code with calls and The optimizer can't remove a (perhaps this code is what you mean by "I make no claim that it is good code ;)" 😄 ) We do have a pass that un-nests such code,
makes that pass 4x faster than plain However, this does not help with |
(func $374 (param $0 i32) (result funcref)
(block $block12186
(block $block12185
..
br_table
..
(return
(ref.func $804)
)
)
(return
(ref.func $804)
)
) 12,186 nested blocks with a Could this function be replaced with a |
Oh, and you can skip
|
Hey thanks for taking a look! This comment is just for info, no actionable items here.
Hmm, I think the block and scratch locals are probably an artifact of stack desugaring; but yes, there can be a ridiculous number of locals in these functions. Basically the hoot compiler backend ends up emitting operations against named SSA-flavored locals, which each get a wasm local. We should re-stackify most of them away. What's more, when there is a non-tail call, we save all live (needed) locals to an explicit stack, and restore them to locals after the call returns. Things could be better on our side ;)
This would probably be saving the live vars before another call... which, we need to improve the code we generate here. It's part of the CPS conversion: you have a function F that make nested calls to G and H. Say F has 1000 variables, and 800 are live at G. So you save 800 values, and restore 800 values after G returns. Then you call H immediately, and assume for simplicity that the same 800 values are live after H returns. So then you have to save 800 values again, for no reason! In practice this is quadratic, because the number of live variables depends on the size of your function, and the number of function calls also depends on the size of your function. We only realized this recently and will need to take a different tack :)
:-) |
I think they are not actually the same? There is only one
Hmm, well, sorta. You could use a table and an element section. We don't actually call these functions, though; this function is actually a side table, only meant to be invoked if there is an error: it is exported and allows the embedder to enumerate the functions in the compilation unit, so that it can build a funcref -> i32 id map. (I would really love to be able to do this from wasm itself but alas, funcrefs cannot be compared or hashed, and you can't attach any other side-band data to a funcref!) Once you have the id, you can look up a function's name, its source location, etc. This is used to print Scheme-level backtraces. But, perhaps a lazily instantiated table would work just as well. But then you have the opposite case: mapping an i32 ID to, say, a source location. There you don't want to instantiate a table of all the answers and then do |
Thanks for the details about the codegen here, very interesting!
Oh, sorry, I was reading the partially-optimized code - basically the code as the pass (return
(ref.func $621)
)
)
(return
(ref.func $619)
)
)
(return
(ref.func $621)
)
)
(return
(ref.func $619)
)
)
(return
(ref.func $621)
)
)
(return
(ref.func $829)
)
)
(return
(ref.func $829)
)
)
(return
(ref.func $829)
)
)
(ref.null nofunc)
)
(func $377 (type $0) (param $0 i32) (param $1 (ref eq)) (param $2 (ref eq)) (param $3 (ref eq))
Oh, I see, yes, this returns the But I don't follow this:
What do you mean by "instantiate"? When I compare the existing code to this: (table $lookup $10 $20 $30 $40 ...)
(func $374 (param $0 i32) (result funcref)
(table.get $lookup
(local.get $0)
)
) then the
Yeah, I agree, that pass seems to end up doing quadratic comparisons. I'll look into it more when I have time. |
Firstly: yay, thank you for fixing the control-flow values issue in the parser! Binaryen can now work on Hoot's binaries. Thank you thank you!
I noticed a performance bug that you may be interested in, for
-Oz
.-O0
-O1
-O2
-Os
-Oz
This is a 32-logical-cpu system. As you can see,
-Os
/-Oz
don't parallelize very well, and takes a bit too long to get useful results.I can provide the test file, should that be of interest, though github doesn't seem to want to attach it. Enabled features are
--enable-bulk-memory --enable-multivalue --enable-reference-types --enable-gc --enable-tail-call --enable-exception-handling
.The text was updated successfully, but these errors were encountered: