-
Notifications
You must be signed in to change notification settings - Fork 776
Replace the old topological sort everywhere #6902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
94e4beb
to
7c11d25
Compare
8082440
to
dee59f2
Compare
7c11d25
to
9ee868a
Compare
dee59f2
to
7c5f31b
Compare
test/lit/passes/unsubtyping.wast
Outdated
;; CHECK: (type $Y (sub $X (struct))) | ||
(type $Y (sub $X (struct))) | ||
|
||
;; CHECK: (type $A (sub (struct (field (ref null $X))))) | ||
(type $A (sub (struct (ref null $X)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test becomes less readable with this change, as it moves the check for $A
away from $A
. We've worked around this manually in the past, by moving the definition to where the output is, but on a change this large obviously that isn't practical. But just applying this change means making potentially many tests less readable, effectively undoing all the manual effort we've put into these tweaks.
Maybe now is a good time to automate those tweaks, before this PR? I mean that the auto-updater of lit tests could move definitions so that they appear together automatically. Or, it could move the checks if the auto-updater would also accept that change in order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After talking offline, we decided to add an option that propagates the order of types from the input through to the output, similarly to how we propagate type names. Once we can use that option in some of these large tests with many types where the optimized output order doesn't matter, the diff here will become much smaller.
Unlike other module elements, types are not stored on the `Module`. Instead, they are collected by traversing the IR before printing and binary writing. The code that collects the types tries to optimize the order of rec groups based on the number of times each type is used. As a result, the output order of types generally has no relation to the input order of types. In addition, most type optimizations rewrite the types into a single large rec group, and the order of types in that group is essentially arbitrary. Changes to the code for counting type uses, sorting types, or sorting rec groups can yield very large changes in the output order of types, producing test diffs that are hard to review and potentially harming the readability of tests by moving output types away from the corresponding input types. To help make test output more stable and readable, introduce a wasm-opt option that causes the order of output types to match the order of input types as closely as possible. It is implemented by having the parsers record the indices of the input types on the `Module` just like they already record the type names. The `GlobalTypeRewriter` infrastructure used by type optimizations associates the new types with the old indices just like it already does for names and also respects the input order when rewriting types into a large recursion group. By default, wasm-opt clears the recorded type indices after parsing the module, in which case its behavior is not modified by this change. Other tools do not clear the recorded type indices, so their output types now match the order of their input types. While full fidelity round-tripping is not a goal of any Binaryen tool, there's no downside to making the round trip more exact for non-optimizing tools. Follow-on PRs will use the new flag in more tests, which will generate large diffs but leave the tests in stable, more readable states that will no longer change due to other changes to the optimizing type sorting logic.
These are the tests that would otherwise have the largest diffs when changing the topological sort used to sort types. signature-refining_gto.wat also cannot be automatically updated, so there is extra benefit to making sure it has stable output.
To avoid having two separate topological sort utilities in the code base, replace remaining uses of the old DFS-based, CRTP topological sort with the newer Kahn's algorithm implementation. This would be NFC, except that the new topological sort produces a different order than the old topological sort, so the output of some passes is reordered.
7c5f31b
to
6d4f529
Compare
I've now rebased this on top of #6917, so the test diff is much smaller. I'd be happy to apply --preserve-type-order more broadly if you would prefer, but I've erred on the side of not applying it for now. Unfortunately the changes to the ctor-eval tests are unrelated to type ordering, so there's nothing we can do to make those smaller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still many changes in this file despite it using --preserve-type-order, but they're because the names used for the merged types changed.
@@ -18,7 +18,7 @@ | |||
#define wasm_ir_subtypes_h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
github informs me that something changed in this file since last I read it. Looks like there were forced-pushes, so I can't read the commit, and I tried the "show diff" on the force-push, which @brendandahl recommended to me a while ago. But the diffs there is huge and dominated by unrelated changes, merges from main, I assume.
Am I missing a way that github marks the changes in the PR from unrelated changes? Or is there another good way to see incremental updates in a large PR like this with force-pushes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, sorry, this was a more destructive force push than usual because there were many merge conflicts when rebasing on main. The only way to avoid that would be to use merge commits instead of rebasing, but that comes with its own headaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...what are the headaches of merge commits? I would humbly suggest that we consider using them more 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To navigate stacked PRs effectively, you want to maintain a linear history from the tip of the stack back down to main
. As such, my stack management script current rebases when syncing new changes, which keeps the stack organization the same as the underlying commit organization. I could experiment with maintaining the linear stack of PRs but allowing the underlying commits to become arbitrary merge spaghetti, though. Maybe it wouldn't actually affect my workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have much trouble with stacked PRs myself, and I never rebase. Though, I rarely open multiple parts of the stack on github at once? (I use branches in my personal repo, which would make later parts PRs on my fork, not upstream. So I just wait to open them.)
Each time I make a change in a PR in the middle of the stack, I need to update the ones "downstream" in the chain. I just do merge commits for those. I have had no issues when doing so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated my script to use merges instead of rebases. Let's see what happens!
@@ -11,19 +11,20 @@ | |||
(field $vtable (ref $Object.vtable)) | |||
(field $itable (ref $Object.itable))))) | |||
|
|||
;; CHECK: (type $Object.vtable (sub (struct (field structref)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This motion looks like it worsened the readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add --preserve-type-order to this file.
@@ -237,10 +237,11 @@ | |||
(module | |||
(rec | |||
;; CHECK: (rec | |||
;; CHECK-NEXT: (type $unrelated (sub (func))) | |||
;; CHECK-NEXT: (type $top (sub (func))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Might be worth adding the flag to any that regress here.
Update the remaining tests whose readability will be affected by the removal of the old topological sort in #6902, no matter how small their diffs would have been.
Update the remaining tests whose readability will be affected by the removal of the old topological sort in #6902, no matter how small their diffs would have been.
test/lit/passes/type-merging.wast
Outdated
@@ -42,7 +42,7 @@ | |||
;; CHECK-NEXT: ) | |||
;; CHECK-NEXT: ) | |||
(func $foo | |||
;; $A will remain the same. | |||
;; $A will remain the sam^e. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
;; $A will remain the sam^e. | |
;; $A will remain the same. |
Unless this has a meaning I am missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, nope, I just fat fingered it.
To avoid having two separate topological sort utilities in the code base,
replace remaining uses of the old DFS-based, CRTP topological sort with the
newer Kahn's algorithm implementation.
This would be NFC, except that the new topological sort produces a different
order than the old topological sort, so the output of some passes is reordered.