Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move exactness to heap types #18

Merged
merged 1 commit into from
Mar 19, 2025
Merged

Move exactness to heap types #18

merged 1 commit into from
Mar 19, 2025

Conversation

tlively
Copy link
Member

@tlively tlively commented Mar 18, 2025

Rather than having exactness as a new attribute of reference types, make
it a new attribute of heap types. By restricting exact heap types to
using type indices, we make it impossible to construct an exact abstract
heap type and thereby avoid introducing new uninhabitable types. This
change also dramatically simplifies the handling of exact types in
existing instructions; existing heap type immediates can now serve to
encode exactness rather than needing new encodings for the base
instructions.

Rather than having exactness as a new attribute of reference types, make
it a new attribute of heap types. By restricting exact heap types to
using type indices, we make it impossible to construct an exact abstract
heap type and thereby avoid introducing new uninhabitable types. This
change also dramatically simplifies the handling of exact types in
existing instructions; existing heap type immediates can now serve to
encode exactness rather than needing new encodings for the base
instructions.
@tlively tlively mentioned this pull request Mar 18, 2025
Copy link

@sjrd sjrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very neat!


Similarly, we allow combining `exact` with the established shorthands in the text format.
For example `(exact anyref)` is a shorthand for `(ref null exact any)`.
Note that the type index being encoded as a `u32` instead of a `u33`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that the type index being encoded as a `u32` instead of a `u33`
Note that the type index being encoded as a `u32` instead of an `s33`

reftype :: ...
| 0x62 0x64 ht:heaptype => ref exact ht
| 0x62 0x63 ht:heaptype => ref null exact ht
heaptype :: ... | 0x62 x:u32 => exact x
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0x62 made sense as the next opcode in the reftype production. In the heaptype production the next available opcode is 0x69. But also we might want to leave the 0x6x range entirely available for future abstract heap types? Perhaps 0x5F would then start a new category for heap types prefixes?

Copy link
Member

@rossberg rossberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jakobkummerow jakobkummerow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@titzer
Copy link
Contributor

titzer commented Mar 18, 2025

Just to be clear, the implication of this is that due to canonicalization, there can be two different, non-equivalent declarations of an exact type for a given heap type H, because those declarations were in different recursion groups.

Does that mean that we just require some exact type for the input to a custom-RTT allocation?

@rossberg
Copy link
Member

@titzer, I'm not sure I understand your question, can you make it more concrete?

Perhaps this answers it(?): Even with this refactoring, exactness remains an attribute at the use site of type definitions. Two exact types are equivalent if and only if both refer to the same type index, or to type indices for equivalent types, regardless of the context in which they occur. So wrt to RTT operands, yes, (ref (exact $t)) and (ref (exact $u)) are interchangeable as long as $t and $u are.

@jakobkummerow
Copy link
Contributor

FWIW, while this may seem elegant and minimal from a spec perspective, my initial assessment is that the implementation impact will be huge: in V8 (and presumably most other implementations too), we have a lot of code that assumes that, essentially, HeapType == int ("s33", but with tighter limits per spec), so things like buffer->write_i32v(heap_type) just work; and to reflect this change, they'll all have to be refactored to use some kind of struct/class type instead. That's algorithmically easy, of course, but it looks like many hours of busy-work.

I hope y'all are sure that this is the right change to make.

@rossberg
Copy link
Member

rossberg commented Mar 18, 2025

Given the implementation limit of 1M types per module in a JS embedding, don't you have plenty of spare bits in a 32-bit word that the implementation can store that flag in?

@tlively
Copy link
Member Author

tlively commented Mar 18, 2025

FWIW I'm somewhat ambivalent about this change. It's certainly cleaner in the spec, but I do think there is an advantage to handling exactness and nullability in the same place rather than splitting them between reference types and heap types. (And of course I never minded the extra uninhabitable types that came with that.)

I'm tempted to land this change for the spec, but continue treating exactness as a property of the reftype in the Binaryen implementation. I suspect deviating from the spec structure like that is asking for trouble down the road, though.

@jakobkummerow
Copy link
Contributor

don't you have plenty of spare bits in a 32-bit word that the implementation can store that flag in?

Yes, thank you, packing the struct into 32 bits is exactly what we do. But that doesn't help for code like this.

Clearly it's possible to rewrite all that. I just hope it's worth it. (Perhaps we'll just let that legacy code rot and write new tests based on different infrastructure.)

@titzer
Copy link
Contributor

titzer commented Mar 19, 2025

@rossberg Yes I think we're on the same page--I was just making it explicit if it wasn't obvious enough.

I'm also ambivalent towards this change, but haven't thought much about the implications of uninhabited types.

@rossberg
Copy link
Member

@tlively, I think conceptually this is the semantics we want, given that the desired rules for bottom and null fall out naturally in exactly the right way without any hacky special cases. Of course, that does not prevent implementations from shifting the representation around in some equivalent way, if that's more convenient for them. Their ASTs will likely contain more redundancy and "junk" (unused representations) in that case, but that may be a reasonable trade-off with other considerations.

@tlively
Copy link
Member Author

tlively commented Mar 19, 2025

After thinking about this more, I'm cautiously optimistic that it will end up being simpler in the Binaryen implementation as well. I'll go ahead and merge this.

@tlively
Copy link
Member Author

tlively commented Mar 22, 2025

It turns out that we don't even need to use a new bit to encode exactness in Binaryen's heap type representation. Since we only store sharedness using a bit for abstract heap types, and because abstract heap types are never exact, we can reuse that same bit to encode exactness for non-abstract types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants