Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Fix AnyHashable's Equatable/Hashable conformance #17396

Merged
merged 3 commits into from
Jun 29, 2018

Conversation

lorentey
Copy link
Member

@lorentey lorentey commented Jun 21, 2018

AnyHashable has a few corner cases where it's not correctly conforming to Equatable and Hashable. This breaks Set and Dictionary invariants and can cause unexpected behavior and/or traps.

This change modifies AnyHashable to fix these cases. It is relatively important to fix this now, because to complete SE-0206, we need to switch integers to type-dependent hashing, and that exacerbates the issue. (The hashing change is also included in this PR.)

  • Fix transitivity of ==. Previously, comparisons involving AnyHashable values with Objective-C provenance were handled specially, breaking Equatable:

    let a = (42 as Int as AnyHashable)
    let b = (42 as NSNumber as AnyHashable)
    let c = (42 as Double as AnyHashable)
    a == b // true
    b == c // true
    a == c // was false(!), now true
    
    let d = ("foo" as AnyHashable)
    let e = ("foo" as NSString as AnyHashable)
    let f = ("foo" as NSString as NSAttributedStringKey as AnyHashable)
    d == e // true
    e == f // true
    d == f // was false(!), now true
  • Fix Hashable conformance for numeric types boxed into AnyHashable:

    b == c // true
    b.hashValue == c.hashValue // was false(!), now true

SE-0170 changed bridging of numeric types so that conversion to NSNumber loses information about the original type. Therefore, to ensure AnyHashable correctly conforms to Equatable, it must follow/emulate NSNumber's rules for equality. The same goes for _SwiftNewtypeWrapper types and NSString. To fix these, this PR defines custom AnyHashable boxes for all numeric types and newtypes. The custom box canonicalizes values that bridge to the same Objective-C representation so that they compare equal and hash the same way, independent of the original Swift type.

SE-0143 added conditional Hashable conformances for arrays, sets and dictionaries, so values of these collection types may themselves be converted to AnyHashable. These collections are also bridged to NSArray, NSSet and NSDictionary, respectively. Therefore, their AnyHashable forms must follow the rules of equality set out by Foundation. This added a number of new cases where Equatable/Hashable conformance is currently broken. For example:

let x: AnyHashable = [1, 2, 3] as [UInt8]
let y: AnyHashable = [1, 2, 3] as NSArray
let z: AnyHashable = [1, 2, 3] as [Double]

x == y // true
y == z // true
x == z // was false(!), now true

x.hashValue == y.hashValue // true
y.hashValue == z.hashValue // was false(!), now true
x.hashValue == z.hashValue // was false(!), now true

This PR fixes this by adding additional AnyHashable boxes for these three collection types. The boxes for sets and dictionaries aren't particularly efficient. However, they're internal to the stdlib, so we'll be able to change them later in a future release without breaking the ABI.

Additional changes:

  • Remove AnyHashable._usedCustomRepresentation. The provenance of a value should not affect its behavior.
  • Allow AnyHashable values to be downcasted into compatible types more often. Since equality implies that (1 as UInt8 as AnyHashable) must be substitutable with (1 as Double as AnyHashable), it follows that we should be able to successfully downcast either value to both types. (This is currently done by bridging to Objective-C and back, so it only works when the Objective-C runtime is available and Foundation is loaded.)
  • Forward _rawHashValue(seed:) to AnyHashable box. This fixes AnyHashable hashing for types that customize single-shot hashing.
  • Enable integer values of different types to produce different hashes, so that e.g. (1 as UInt8).hashValue will typically differ from (1 as UInt64).hashValue. This completes SE-0206 by feeding hasher only as many bytes as necessary to represent each type.
    (42 as UInt8).hashValue == (42 as Int).hashValue // was true, now false

https://bugs.swift.org/browse/SR-7496
rdar://problem/39648819

@lorentey
Copy link
Member Author

@swift-ci please test

@lorentey
Copy link
Member Author

@swift-ci please test source compatibility

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - e081c32654efba762bcac3bb5bcb218de995188d

// AnyHashable
//===----------------------------------------------------------------------===//

internal struct _${Self}AnyHashableBox: _AnyHashableBox {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since every float can be represented as a double, this seems like overkill. Just having a _DoubleAnyHashableBox would be fine, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still vacillating on whether AnyHashable should preserve the original type for numeric values. This version preserves them by using separate box types. Double doesn't have a generic initializer that takes a BinaryFloatingPoint value, so making the box generic would lead to ugly switches on the concrete type.

I think it would be nice to ensure that for all stdlib numeric types, (foo as AnyHashable).base always returns the exact value we started with, of the original type. An alternative that is more consistent with SE-0170 is to allow (42.0 as Double as AnyHashable).base to return an integer, but I think I'd find that surprising.

I think that SE-0170 implies that we should allow downcasting a numeric AnyHashable to any standard integer/float type that can exactly represent the boxed value. I'm still working out how to best do that without going through NSNumber, but meanwhile, I want to make sure we can always downcast AnyHashable back to the original type. (Even if the Objective-C runtime isn't available, or if Foundation isn't loaded.)

let value: Num = 42 // where Num is any standard integer or floating point type
let boxed = value as AnyHashable
precondition(boxed.base is Num)
precondition(boxed as? Num != nil)
precondition(boxed as? Num == value)

Also, Float80 is a special case. It isn't bridged to NSNumber, which makes it the only standard numeric type that's unaffected by SE-0170. The current version of this PR rounds Float80 values to Double for AnyHashable comparisons, but that really should only be done if a Double can exactly represent the value.

% end
}
}

extension ${Self} : _HasCustomAnyHashableRepresentation {
// Not @inlinable - blacklisted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? (Also, we try to avoid the word "blacklist" both for being unspecific and for potentially being culturally charged.)

@lorentey lorentey changed the title [stdlib] Fix AnyHashable's Equatable/Hashable conformance 🛑[stdlib] Fix AnyHashable's Equatable/Hashable conformance Jun 21, 2018
@lorentey
Copy link
Member Author

lorentey commented Jun 21, 2018

Some open issues:

  • AnyHashable downcasting must not involve bridging, because even if the ObjC runtime is available, Foundation may not have been imported. (The original code only tried round-trip bridging if the box came from Foundation, so it did not have this issue.)
  • Ensure that the critical NSDictionary -> [AnyHashable: Any] -> [String: Any] bridging+casting path does not rehash the dictionary more than once.
    • The second step need not do a full rehash, because "foo" as AnyHashable hashes exactly the same as "foo".
    • This is also true for newtypes wrapping Strings, so [AnyHashable: Any] -> [NSAttributedStringKey: Any] should not do a full rehash.
  • Investigate if we can guarantee the same for NSDictionary -> [AnyHashable: Any] -> [Int: Any].
    • Sadly, this seems unlikely, because hashing is different in all three steps.
  • We should allow downcasting from any AnyHashable holding a numeric value to any numeric type that can hold that value exactly, without requiring Foundation to be loaded
    • Ugly solution: We could manually switch on the target type
  • Verify that the AnyHashable representations of Array/Set/Dictionary values holding numeric types & newtypes (1) work correctly (e.g. AnyHashable-boxed sets holding equal numbers compare equal, even if the original types were different), and (2) bridge correctly with NSArray/NSSet/NSDictionary.
  • Implement downcasting from an AnyHashable holding some type to an "extensible enum" (newtype) that wraps that same type (e.g., NSAttributedStringKey) (Works through Foundation now.)
    • These newtypes get bridged to their raw value (e.g. NSAttributedStringKey becomes NSString), so their AnyHashable representation should be interchangeable with their raw value. I.e., ideally, this should return non-nil: "hello" as AnyHashable as? NSAttributedStringKey.
    • For extra fun, the raw value can be a numeric type, with its own complications
    • Find out how to do this without bridging back & forth
  • Fix Float80's AnyHashable semantics. Float80 values should not compare equal to Double values unless they represent the exact same number.

@lorentey
Copy link
Member Author

@swift-ci test

@swiftlang swiftlang deleted a comment from swift-ci Jun 22, 2018
@swiftlang swiftlang deleted a comment from swift-ci Jun 22, 2018
@lorentey
Copy link
Member Author

@swift-ci test

1 similar comment
@lorentey
Copy link
Member Author

@swift-ci test

@lorentey
Copy link
Member Author

@swift-ci test source compatibility

@swiftlang swiftlang deleted a comment from swift-ci Jun 25, 2018
@swiftlang swiftlang deleted a comment from swift-ci Jun 25, 2018
lorentey added 2 commits June 25, 2018 20:14
AnyHashable has numerous edge cases where two AnyHashable values compare equal but produce different hashes. This breaks Set and Dictionary invariants and can cause unexpected behavior and/or traps. This change overhauls AnyHashable's implementation to fix these edge cases, hopefully without introducing new issues.

- Fix transitivity of ==. Previously, comparisons involving AnyHashable values with Objective-C provenance were handled specially, breaking Equatable:

    let a = (42 as Int as AnyHashable)
    let b = (42 as NSNumber as AnyHashable)
    let c = (42 as Double as AnyHashable)
    a == b // true
    b == c // true
    a == c // was false(!), now true

    let d = ("foo" as AnyHashable)
    let e = ("foo" as NSString as AnyHashable)
    let f = ("foo" as NSString as NSAttributedStringKey as AnyHashable)
    d == e // true
    e == f // true
    d == f // was false(!), now true

- Fix Hashable conformance for numeric types boxed into AnyHashable:

    b == c // true
    b.hashValue == c.hashValue // was false(!), now true

  Fixing this required adding a custom AnyHashable box for all standard integer and floating point types. The custom box was needed to ensure that two AnyHashables containing the same number compare equal and hash the same way, no matter what their original type was. (This behavior is required to ensure consistency with NSNumber, which has not been preserving types since SE-0170.

- Add custom AnyHashable representations for Arrays, Sets and Dictionaries, so that when they contain numeric types, they hash correctly under the new rules above.

- Remove AnyHashable._usedCustomRepresentation. The provenance of a value should not affect its behavior.

- Allow AnyHashable values to be downcasted into compatible types more often.

- Forward _rawHashValue(seed:) to AnyHashable box. This fixes AnyHashable hashing for types that customize single-shot hashing.

https://bugs.swift.org/browse/SR-7496
rdar://problem/39648819
Fix Hashable conformance of standard integer types so that the number of bits they feed into hasher is exactly Self.bitWidth.

This was intended to be part of SE-0206. However, it would have introduced additional issues with AnyHashable. The custom AnyHashable representations introduced in the previous commit unify hashing for numeric types, eliminating the problem.
@lorentey lorentey force-pushed the anyhashable-is-not-hashable branch from 1169783 to bf872ec Compare June 26, 2018 15:15
@lorentey
Copy link
Member Author

@swift-ci please test

@lorentey lorentey changed the title 🛑[stdlib] Fix AnyHashable's Equatable/Hashable conformance [stdlib] Fix AnyHashable's Equatable/Hashable conformance Jun 26, 2018
@lorentey
Copy link
Member Author

@swift-ci please test source compatibility

@swiftlang swiftlang deleted a comment from swift-ci Jun 26, 2018
@swiftlang swiftlang deleted a comment from swift-ci Jun 26, 2018
@lorentey lorentey requested a review from DougGregor June 26, 2018 21:24
@lorentey
Copy link
Member Author

@DougGregor If you have a bit of time, could you take a quick look at this? I think you may be able to spot unintended consequences that I missed during testing.

After this lands, I intend to cherry-pick it to 4.2, too.

Copy link
Member

@DougGregor DougGregor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed explanation! This solution is quite a bit cleaner than what it replaces, especially since we no longer need to track where the AnyHashable came from. Thanks!

func _downCastConditional<T>(into result: UnsafeMutablePointer<T>) -> Bool

func _asSet() -> Set<AnyHashable>?
func _asDictionary() -> Dictionary<AnyHashable, AnyHashable>?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little strange to have to enumerate the Set and Dictionary types here, but it seems like you're doing it as a way to get right at the canonical Set or Dictionary. It's a small thing, but a name like _asCanonicalSet might make it more obvious why callers wouldn't use _downCastConditional instead. (I expand on this comment a bit below).

I briefly got concerned about compatibility going forward, if we were to add another special type like Set or Dictionary, but we could add the corresponding _asFoo in the same way (with a default implementation) at the same time. So, I'm not concerned about this.

Copy link
Member Author

@lorentey lorentey Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intend to remove @usableFromInline from this protocol, so we should be able to change it however we like! (Welp, maybe not -- I need to run some benchmarks)

I think we could eliminate these two unsightly requirements by adapting _ArrayAnyHashableProtocol to sets/dictionaries:

internal protocol _SetAnyHashableProtocol: _AnyHashableBox {
  var count: Int { get }
  func contains(_ member: AnyHashable) -> Bool
}

internal struct _SetAnyHashableBox<Element: Hashable>: _SetAnyHashableProtocol {
  let _base: Set<Element>

  func contains(_ member: AnyHashable) -> Bool {
    guard let member = member as? Element else { return false }
    return _base.contains(member)
  }
  func _isEqual(to other: _AnyHashableBox) -> Bool? {
    guard let other = other as? _SetAnyHashableProtocol else { return nil }
    guard let _base.count == other.count else { return false }
    for member in _base {
      guess other.contains(member as AnyHashable) else { return false }
    }
    return true
  }
  // etc.
}

This would clean up the interface and also eliminate the need for storing a full copy of the set/dictionary in _canonical. However, it would require downcasting from AnyHashable to any ==-compatible type, which isn't rock solid yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that we can use _canonicalBox to force all sets/dictionaries into a single known box type, so we can get rid of _asSet()/_asDictionary() without relying on downcasts. 🎉

Implemented in 2f4ad79

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, that's a good solution!

internal func _downCastConditional<T>(
into result: UnsafeMutablePointer<T>
) -> Bool {
guard let value = _value as? T else { return false }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. If there were a fast path for type(of: T) == Dictionary<AnyHashable, AnyHashable>.self, would it obviate the need for _asDictionary, because _isEqual would be able to efficiently use _downCastConditional? (Sorry, same question as before).

Copy link
Member Author

@lorentey lorentey Jun 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I get this, but FWIW, it sure would be helpful if we could analyze the structure of T at runtime and detect that it's a Dictionary, extract its Key & Value types and perform an elementwise downcast/conversion from _value:

if case Dictionary<let K, let V> = T { ... }

There is a similar issue with _SwiftNewtypeWrapper, which uses its rawValue's AnyHashable representation, but can't easily be downcast from it. ("hello" as AnyHashable as? NSAttributedString.Key should be non-nil, but (as far as I know), _ConcreteAnyHashableBox<String>._downCastConditional<T> can't detect that the target type conforms to _SwiftNewtypeWrapper where RawValue == String, so we have to go through Foundation.

_canonicalBox can perform essentially the same job, so there is no reason to have these requirements.
@lorentey
Copy link
Member Author

@swift-ci please test

Copy link
Member

@DougGregor DougGregor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this much better, but I have a question about the _isEqual check. It seems like it should be down casting the canonical box of other, not other itself.

guard let other = other._asDictionary() else { return nil }
return _canonical == other
guard
let other = other as? _DictionaryAnyHashableBox<AnyHashable, AnyHashable>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, shouldn't you be downcasting other._canonicalBox to _DictionaryAnyHashableBox<AnyHashable, AnyHashable> here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_isEqual is guaranteed to be only called on canonical boxes.

(I remember adding this guarantee as an optimization, but now I can't see how this is better compared to canonicalizing self and other right here. Hm...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I added the guarantee because canonicalization sometimes results in one of several unrelated types (e.g., a canonicalized Double box could be either _DoubleAnyHashableBox or _IntegerAnyHashableBox<Int64/UInt64>), and it was unwieldy to manage those directly in _isEqual.

internal func _isEqual(to other: _AnyHashableBox) -> Bool? {
guard let other = other._asSet() else { return nil }
return _canonical == other
guard let other = other as? _SetAnyHashableBox<AnyHashable> else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here.

@lorentey
Copy link
Member Author

@swift-ci please test source compatibility

@lorentey lorentey merged commit a4e9109 into swiftlang:master Jun 29, 2018
@lorentey lorentey deleted the anyhashable-is-not-hashable branch June 29, 2018 16:38
drodriguez added a commit to drodriguez/swift that referenced this pull request Mar 18, 2019
Since a4e9109 (swiftlang#17396), both the hashes and the equality of numeric
types inside of AnyHashable do not follow the rules that this part of
the comment was talking about.

I couldn't find an easy example that shows the same behaviour, so I
decided to remove the comment completely.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants