Skip to content

Replace the old topological sort everywhere #6902

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Sep 10, 2024
Merged

Conversation

tlively
Copy link
Member

@tlively tlively commented Sep 4, 2024

To avoid having two separate topological sort utilities in the code base,
replace remaining uses of the old DFS-based, CRTP topological sort with the
newer Kahn's algorithm implementation.

This would be NFC, except that the new topological sort produces a different
order than the old topological sort, so the output of some passes is reordered.

@tlively tlively requested a review from kripken September 4, 2024 21:02
@tlively tlively force-pushed the min-topo-sort-recgroups branch from 94e4beb to 7c11d25 Compare September 4, 2024 21:09
@tlively tlively force-pushed the replace-old-topo-sort branch from 8082440 to dee59f2 Compare September 4, 2024 21:09
@tlively tlively force-pushed the min-topo-sort-recgroups branch from 7c11d25 to 9ee868a Compare September 4, 2024 22:44
@tlively tlively force-pushed the replace-old-topo-sort branch from dee59f2 to 7c5f31b Compare September 4, 2024 22:44
;; CHECK: (type $Y (sub $X (struct)))
(type $Y (sub $X (struct)))

;; CHECK: (type $A (sub (struct (field (ref null $X)))))
(type $A (sub (struct (ref null $X))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test becomes less readable with this change, as it moves the check for $A away from $A. We've worked around this manually in the past, by moving the definition to where the output is, but on a change this large obviously that isn't practical. But just applying this change means making potentially many tests less readable, effectively undoing all the manual effort we've put into these tweaks.

Maybe now is a good time to automate those tweaks, before this PR? I mean that the auto-updater of lit tests could move definitions so that they appear together automatically. Or, it could move the checks if the auto-updater would also accept that change in order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After talking offline, we decided to add an option that propagates the order of types from the input through to the output, similarly to how we propagate type names. Once we can use that option in some of these large tests with many types where the optimized output order doesn't matter, the diff here will become much smaller.

Base automatically changed from min-topo-sort-recgroups to main September 5, 2024 00:16
Unlike other module elements, types are not stored on the `Module`.
Instead, they are collected by traversing the IR before printing and
binary writing. The code that collects the types tries to optimize the
order of rec groups based on the number of times each type is used. As a
result, the output order of types generally has no relation to the input
order of types. In addition, most type optimizations rewrite the types
into a single large rec group, and the order of types in that group is
essentially arbitrary. Changes to the code for counting type uses,
sorting types, or sorting rec groups can yield very large changes in the
output order of types, producing test diffs that are hard to review and
potentially harming the readability of tests by moving output types away
from the corresponding input types.

To help make test output more stable and readable, introduce a wasm-opt
option that causes the order of output types to match the order of input
types as closely as possible. It is implemented by having the parsers
record the indices of the input types on the `Module` just like they
already record the type names. The `GlobalTypeRewriter` infrastructure
used by type optimizations associates the new types with the old indices
just like it already does for names and also respects the input order
when rewriting types into a large recursion group.

By default, wasm-opt clears the recorded type indices after parsing the
module, in which case its behavior is not modified by this change. Other
tools do not clear the recorded type indices, so their output types now
match the order of their input types. While full fidelity round-tripping
is not a goal of any Binaryen tool, there's no downside to making the
round trip more exact for non-optimizing tools.

Follow-on PRs will use the new flag in more tests, which will generate
large diffs but leave the tests in stable, more readable states that
will no longer change due to other changes to the optimizing type
sorting logic.
These are the tests that would otherwise have the largest diffs when
changing the topological sort used to sort types.
signature-refining_gto.wat also cannot be automatically updated, so
there is extra benefit to making sure it has stable output.
To avoid having two separate topological sort utilities in the code base,
replace remaining uses of the old DFS-based, CRTP topological sort with the
newer Kahn's algorithm implementation.

This would be NFC, except that the new topological sort produces a different
order than the old topological sort, so the output of some passes is reordered.
@tlively tlively force-pushed the replace-old-topo-sort branch from 7c5f31b to 6d4f529 Compare September 7, 2024 03:03
@tlively tlively changed the base branch from main to use-preserve-type-order September 7, 2024 03:03
@tlively
Copy link
Member Author

tlively commented Sep 7, 2024

I've now rebased this on top of #6917, so the test diff is much smaller. I'd be happy to apply --preserve-type-order more broadly if you would prefer, but I've erred on the side of not applying it for now. Unfortunately the changes to the ctor-eval tests are unrelated to type ordering, so there's nothing we can do to make those smaller.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still many changes in this file despite it using --preserve-type-order, but they're because the names used for the merged types changed.

@@ -18,7 +18,7 @@
#define wasm_ir_subtypes_h
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github informs me that something changed in this file since last I read it. Looks like there were forced-pushes, so I can't read the commit, and I tried the "show diff" on the force-push, which @brendandahl recommended to me a while ago. But the diffs there is huge and dominated by unrelated changes, merges from main, I assume.

Am I missing a way that github marks the changes in the PR from unrelated changes? Or is there another good way to see incremental updates in a large PR like this with force-pushes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, sorry, this was a more destructive force push than usual because there were many merge conflicts when rebasing on main. The only way to avoid that would be to use merge commits instead of rebasing, but that comes with its own headaches.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...what are the headaches of merge commits? I would humbly suggest that we consider using them more 🙏

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To navigate stacked PRs effectively, you want to maintain a linear history from the tip of the stack back down to main. As such, my stack management script current rebases when syncing new changes, which keeps the stack organization the same as the underlying commit organization. I could experiment with maintaining the linear stack of PRs but allowing the underlying commits to become arbitrary merge spaghetti, though. Maybe it wouldn't actually affect my workflow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much trouble with stacked PRs myself, and I never rebase. Though, I rarely open multiple parts of the stack on github at once? (I use branches in my personal repo, which would make later parts PRs on my fork, not upstream. So I just wait to open them.)

Each time I make a change in a PR in the middle of the stack, I need to update the ones "downstream" in the chain. I just do merge commits for those. I have had no issues when doing so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated my script to use merges instead of rebases. Let's see what happens!

@@ -11,19 +11,20 @@
(field $vtable (ref $Object.vtable))
(field $itable (ref $Object.itable)))))

;; CHECK: (type $Object.vtable (sub (struct (field structref))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This motion looks like it worsened the readability?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add --preserve-type-order to this file.

@@ -237,10 +237,11 @@
(module
(rec
;; CHECK: (rec
;; CHECK-NEXT: (type $unrelated (sub (func)))
;; CHECK-NEXT: (type $top (sub (func)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Might be worth adding the flag to any that regress here.

@tlively tlively changed the base branch from use-preserve-type-order to more-preserve-type-order September 10, 2024 03:16
tlively added a commit that referenced this pull request Sep 10, 2024
Update the remaining tests whose readability will be affected by the
removal of the old topological sort in #6902, no matter how small their
diffs would have been.
Base automatically changed from more-preserve-type-order to main September 10, 2024 19:01
@tlively tlively requested a review from kripken September 10, 2024 20:32
@@ -42,7 +42,7 @@
;; CHECK-NEXT: )
;; CHECK-NEXT: )
(func $foo
;; $A will remain the same.
;; $A will remain the sam^e.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
;; $A will remain the sam^e.
;; $A will remain the same.

Unless this has a meaning I am missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, nope, I just fat fingered it.

@tlively tlively enabled auto-merge (squash) September 10, 2024 21:54
@tlively tlively merged commit 1a2d26f into main Sep 10, 2024
13 checks passed
@tlively tlively deleted the replace-old-topo-sort branch September 10, 2024 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants