Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Better synthesis of Hashable._hash(into:) #14935

Closed

Conversation

lorentey
Copy link
Member

@lorentey lorentey commented Mar 2, 2018

This is an unfinished update to #14913 that removes the glaring cheat of force-synthesizing _hash(into:) whenever we derive hashValue.

_hash(into:) is now automatically derived whenever it's not explicitly implemented, including inside classes and extensions of (possibly imported or synthesized) structs, enums or classes.

This is not entirely free of magic, though. To determine which implementation to generate, the body synthesizer takes a peek into the parent decl context -- if it finds an implicit hashValue there, the generated body implements hashing from scratch, feeding components into the hasher. Otherwise, we assume hashValue has an explicit implementation, and we generate the body for the (previously) default implementation of _hash(into:).

To make _CFObject's Hashable conformance apply correctly on CF types, I needed to add an explicit implementation of _hash(into:) to the CoreFoundation overlay. Without this change, the compiler would try to derive _hash(into:) directly into the imported CF types, which (understandably(?)) doesn't work. The problem seems to be that there is no other suitable DeclContext for the Hashable conformance, which in this case is implemented in a seemingly unrelated protocol extension. (This problem seems to be specific to CF types; I couldn't reproduce it with regular imported classes.)

This change also introduces some diagnostic regressions that are yet to be fixed.

@lorentey
Copy link
Member Author

lorentey commented Mar 2, 2018

Cc @jrose-apple -- With the last two commits, Hashable derivation relies on the compiler's normal mechanism to derive _hash(into:) instead of the ugly piggybacking on hashValue.

Does this look reasonable? I'm a bit worried that the CoreFoundation problem may indicate a deeper underlying issue.

lorentey added 15 commits March 2, 2018 19:17
Empty structs can be instantiated, which doesn’t make sense for these namespacing declarations.
…orderings

We shouldn't need to update tests each time we modify the hash function.
Introduce _UnsafeHasher, representing an opaque hash compression function with a faux-pure interface.

Add the method _hash(into:) as a Hashable requirement, along with a default implementation that simply feeds hashValue to the hasher.

Add _hash(into:) implementations for the default integer types, and replace hashValue. Note that Int.hashValue does not return self anymore.

Add struct _LegacyHasher, emulating Swift 4.1 hashes in the new interface.
This switches the primary hashing interface from hashValue to _hash(into:).
Beyond switching hashing algorithms, this also enables per-execution hash seeds, fulfilling a long-standing prophecy in Hashable’s documentation.

To reduce the possibility of random test failures, StdlibUnittest’s TestSuite overrides the random hash seed on initialization.

rdar://problem/24109692
rdar://problem/35052153
…erface

Force-derive Hashable._hash(into:) whenever we need to derive hashValue. Do all the actual work there.

_hash(into:) has a default implementation, so it wouldn’t normally be considered a requirement in need of synthesized conformance. However, for types that synthesize their hashability, we want to replace the default with a custom implementation.

A better option would be to remove the default implementation and synthesize it (and/or hashValue) as needed:

1. If a witness hashValue cannot be resolved, synthesize one that simply returns _hashValue(for: self). (I.e., express hashValue in terms of _hash(into:).)
2. If a witness for _hash(into:) cannot be resolved:
  a) if there is an explicit witness for hashValue, synthesize _hash(into:) such that it simply feeds the hashValue to the hasher.
  b) otherwise if the parent context is not a struct or enum decl, then fail and produce a diagnostic. We can’t fully synthesize Hashable for classes or in extensions.
  c) if the components of the type aren’t all Hashable, then fail and produce a diagnostic about a missing hashValue implementation.
  c) otherwise synthesize _hash(into:) by feeding the type’s components into the hasher.

However, the DerivedConformance is not yet a great fit for such complicated derivation rules, so we’d need to pretend Hashable is always derivable, and manually provide diagnostics in cases 2/b and 2/c. Providing high-quality, non-duplicated diagnostics is a challenge.

Even worse, case 2/a would require us to support synthesizing _hash(into:) into extensions, classes, synthesized structs and imported types — and unfortunately, compiler support for doing this is currently limited.
_hash(into:) needs to be included in expectations; tests looking at synthesized Hashable implementation bodies need to be updated for resilient hashing.
This removes the glaring cheat of force-synthesizing _hash(into:) whenever we derive hashValue.

_hash(into:) is now automatically derived whenever it's not explicitly implemented, including in classes and extensions of (possibly imported or synthesized) structs, enums or classes.

This is not entirely free of magic, though. To determine which implementation to generate, the body synthesizer takes a peek into the parent decl context -- if it finds an implicit hashValue there, the generated body implements hashing from scratch, feeding components into the hasher. Otherwise, we assume hashValue has been resolved to an explicit implementation, and we generate the body for the (previously) default implementation of _hash(into:).

This change introduces some diagnostic regressions, and also requires a followup change in the CoreFoundation overlay.
…into:)

Without this change, the Hashable synthesizer attempts to add _hash(into:) methods directly on CF types, which (somewhat unsurprisingly) fails with the following assert:

Assertion failed: (!decl->isForeign() && "Use getForeignMetadataLayout()"), function getClassMetadataLayout, file /Users/lorentey/Swift/swift/lib/IRGen/MetadataLayout.cpp, line 86.
0  swift                    0x0000000108e28b08 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 40
1  swift                    0x0000000108e29216 SignalHandler(int) + 694
2  libsystem_platform.dylib 0x00007fff587aef5a _sigtramp + 26
3  libsystem_platform.dylib 0x0000000000003628 _sigtramp + 2810529512
4  libsystem_c.dylib        0x00007fff585571ae abort + 127
5  libsystem_c.dylib        0x00007fff5851f1ac basename_r + 0
6  swift                    0x0000000105a2a460 swift::irgen::IRGenModule::getClassMetadataLayout(swift::ClassDecl*) + 96
7  swift                    0x0000000105970dc8 swift::irgen::emitVirtualMethodValue(swift::irgen::IRGenFunction&, llvm::Value*, swift::SILDeclRef, swift::CanTypeWrapper<swift::SILFunctionType>) + 184
8  swift                    0x000000010597121d swift::irgen::emitVirtualMethodValue(swift::irgen::IRGenFunction&, llvm::Value*, swift::SILType, swift::SILDeclRef, swift::CanTypeWrapper<swift::SILFunctionType>, bool) + 685
9  swift                    0x0000000105a01482 swift::SILInstructionVisitor<(anonymous namespace)::IRGenSILFunction, void>::visit(swift::SILInstruction*) + 36562
10 swift                    0x00000001059f4e4f (anonymous namespace)::IRGenSILFunction::emitSILFunction() + 6863
11 swift                    0x00000001059f2e1b swift::irgen::IRGenModule::emitSILFunction(swift::SILFunction*) + 1371
12 swift                    0x00000001058fccab swift::irgen::IRGenerator::emitLazyDefinitions() + 1051
13 swift                    0x00000001059cf676 performIRGeneration(swift::IRGenOptions&, swift::ModuleDecl*, std::__1::unique_ptr<swift::SILModule, std::__1::default_delete<swift::SILModule> >, llvm::StringRef, swift::PrimarySpecificPaths const&, llvm::LLVMContext&, swift::SourceFile*, llvm::GlobalVariable**, unsigned int) + 1366
14 swift                    0x00000001059cfc5e swift::performIRGeneration(swift::IRGenOptions&, swift::SourceFile&, std::__1::unique_ptr<swift::SILModule, std::__1::default_delete<swift::SILModule> >, llvm::StringRef, swift::PrimarySpecificPaths const&, llvm::LLVMContext&, unsigned int, llvm::GlobalVariable**) + 94
15 swift                    0x000000010586d2a9 performCompile(swift::CompilerInstance&, swift::CompilerInvocation&, llvm::ArrayRef<char const*>, int&, swift::FrontendObserver*, swift::UnifiedStatsReporter*) + 15337
16 swift                    0x0000000105868715 swift::performFrontend(llvm::ArrayRef<char const*>, char const*, void*, swift::FrontendObserver*) + 2869
17 swift                    0x000000010581f300 main + 2832
18 libdyld.dylib            0x00007fff584ab015 start + 1
@lorentey lorentey force-pushed the resilient-hashing-better-synthesis branch from dcadac4 to bb64446 Compare March 2, 2018 19:18
@lorentey
Copy link
Member Author

lorentey commented Mar 2, 2018

@swift-ci please smoke test

@jrose-apple
Copy link
Contributor

Without looking at the rest of the code, the CF issue sounds like it'll come into play whenever a protocol other than Hashable has a default implementation of hashValue. _SwiftNewtypeWrapper, for example (used for NS_TYPED_ENUM / NS_STRING_ENUM et al).

@jrose-apple
Copy link
Contributor

Sorry, I guess that's specific to imported types. Maybe the way we do all other imported types is safe enough, since they're structs or enums and we can add things to them safely.

@lorentey
Copy link
Member Author

lorentey commented Mar 4, 2018

Looking into this in a little more detail, CF types seem special because they gain the _CFObject conformance magically: the importer adds it to the opaque CF type as if it was directly on an imported class definition.

We can't put new derived methods into that imported class, but adding the _hash(into:) definition into the CoreFoundation overlay looks like an acceptable workaround -- as long as there is no way for user-defined types to gain implicit conformances like that.

For non-magical imported types, the Hashable conformance must be added in some explicit extension somewhere, and the extension provides a suitable DeclContext for the derived _hash(into:) method -- so users would never run into the same situation.

…present

This makes it possible to implement Hashable by defining either hashValue or _hash(into:).
@lorentey
Copy link
Member Author

lorentey commented Mar 5, 2018

I (manually) merged this back to #14913, including the last commit that enables users to provide only one of hashValue and _hash(into:), with the other member synthesized automatically by the compiler.

Some of the diagnostic regressions remain, and a test for an old compiler crasher exposed a new crash; I'll fix these on the primary PR soon.

@lorentey lorentey closed this Mar 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants