Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce adjacentPairs #119

Merged
merged 4 commits into from
May 4, 2021
Merged

Conversation

mpangburn
Copy link
Contributor

Implements an adjacentPairs algorithm on Sequence, per forum discussion: https://forums.swift.org/t/add-an-adjacentpairs-algorithm-to-sequence/14817.

Description

Lazily iterates over tuples of adjacent elements.

This operation is available for any sequence by calling the adjacentPairs() method.

let numbers = (1...5)
let pairs = numbers.adjacentPairs()
// Array(pairs) == [(1, 2), (2, 3), (3, 4), (4, 5)]

Detailed Design

The adjacentPairs() method is declared as a Sequence extension returning AdjacentPairs.

extension Sequence {
  public func adjacentPairs() -> AdjacentPairs<Self>
}

The resulting AdjacentPairs type is a sequence, with conditional conformance to Collection, BidirectionalCollection, and RandomAccessCollection when the underlying sequence conforms.

The spelling zip(s, s.dropFirst()) for a sequence s is an equivalent operation on collection types; however, this implementation is undefined behavior on single-pass sequences, and Zip2Sequence does not conditionally conform to the Collection family of protocols.

Documentation Plan

  • API introduced in AdjacentPairs.swift is documented.
  • AdjacentPairs.md is included in Guides.
  • README is updated.
  • CHANGELOG is updated.

Test Plan

Unit tests capture expected index-related invariants about Collection and verify expected behavior for sequences of varying lengths, including the empty and single-element sequence cases.

Source Impact

This change is strictly additive.

Checklist

  • I've added at least one test that validates that my change is working, if appropriate
  • I've followed the code style of the rest of the project
  • I've read the Contribution Guidelines
  • I've updated the documentation if necessary

@ollieatkinson
Copy link
Contributor

ollieatkinson commented Mar 29, 2021

We have something that can do this already - I wonder if we need a new collection for something we can easily compose:

import Algorithms

let numbers = (1...5)
let pairs = numbers.windows(ofCount: 2)
// [[1, 2], [2, 3], [3, 4], [4, 5]]

If you wanted to get your exact API for tuples you can write an extension re-using the windows(ofCount:) implementation:

extension Collection {
    
    func adjacentPairs() -> AnyCollection<(Element, Element)> {
        AnyCollection(windows(ofCount: 2).map {
            ($0[$0.startIndex], $0[$0.index(after: $0.startIndex)])
        })
    }
}

@natecook1000
Copy link
Member

natecook1000 commented Mar 29, 2021

@ollieatkinson While there's overlap between this and windows(ofCount:), adjacentPairs provides enough benefit that it can stand as its own, separate adapter. Primarily, the fact that it vends tuples as its element type simplifies a variety of pairwise operations — for example, you can test that a sequence is sorted by writing seq.adjacentPairs().allSatisfy(<=).

@natecook1000
Copy link
Member

natecook1000 commented Mar 29, 2021

One of the things we're finding as we build the wrappers in this package is that a collection's startIndex, endIndex, and the subscript really do need to be O(1), as documented (even though we haven't always adhered to that in the stdlib itself). When that isn't the case, combined operations can end up with unpredictable performance.

For this collection wrapper, that means that we'll need to compute startIndex when calling adjacentPairs() and store it as an instance property, instead of calculating the first pair of indices each time. That, in turn, will mean we need separate AdjacentPairsSequence and AdjacentPairsCollection types, and that the index for the collection type should wrap both indices instead of just the first one. With those changes, you'll be able to access the start/end index and use the subscript without needing to advance indices on the underlying collection.

p.s. Thanks so much for bringing this addition to the Algorithms package!

@ollieatkinson
Copy link
Contributor

One of the things we're finding as we build the wrappers in this package is that a collection's startIndex, endIndex, and the subscript really do need to be O(1), as documented (even though we haven't always adhered to that in the stdlib itself). When that isn't the case, combined operations can end up with unpredictable performance.

This is a good call, and something which is worthy to be wary of - when implementing the sliding window algorithm it was one of the primary concerns that access to both startIndex and endIndex is O(1).

It's true that the current implementation of the sliding window algorithm stores the starting offset, I would be interested to know if the impact of that storage is causing trouble for many algorithms given the fantastic capacity of our devices today.

I am a little unsure to when we stop writing new wrappers vs using composition of existing types, and improving those to unlock new capabilities - for example, improving the sliding windows algorithms to address any of the concerns we would have to re-use in a implementation of adjacentPairs.

Having said all of this, for what it's worth, the implementation of adjacentPairs here looks good!

@timvermeulen
Copy link
Contributor

It's true that the current implementation of the sliding window algorithm stores the starting offset, I would be interested to know if the impact of that storage is causing trouble for many algorithms given the fantastic capacity of our devices today.

The end user likely won't notice the difference in most scenarios, but in extreme cases it can make a meaningful difference (as described here).

I am a little unsure to when we stop writing new wrappers vs using composition of existing types, and improving those to unlock new capabilities - for example, improving the sliding windows algorithms to address any of the concerns we would have to re-use in a implementation of adjacentPairs.

We should certainly compose existing abstractions to make new ones whenever we reasonably can! And slidingWindows indeed almost gets us there. Unfortunately Sequence and Collection are sufficiently different from each other that in this case we're unable to create a Sequence method on top of one that's only available to Collection.

@mpangburn mpangburn force-pushed the mpangburn/adjacent-pairs branch from 534881c to b0f3020 Compare April 11, 2021 03:53
@mpangburn
Copy link
Contributor Author

@natecook1000 Appreciate the pointers—I've made the following changes:

  • AdjacentPairsSequence and AdjacentPairsCollection are distinct types, and the adjacentPairs() extension method is overloaded appropriately.
  • For AdjacentPairsCollection, startIndex is precomputed, and AdjacentPairsCollection.Index stores both indices into the underlying collection.
  • AdjacentPairsSequence.Iterator no longer computes base.next() in init, deferring fetching the initial element until its own next method is called.
  • Inlining annotations have been added.

Let me know if there are any other improvements I can make.

Comment on lines +161 to +168
extension AdjacentPairsCollection {
public typealias Iterator = AdjacentPairsSequence<Base>.Iterator

@inlinable
public func makeIterator() -> Iterator {
Iterator(base: base.makeIterator())
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is beneficial, mostly because we're precomputing startIndex. Won't this mean we end up doing some duplicate work when iterating over someCollection.adjacentPairs()?

}

@inlinable
public func index(_ i: Index, offsetBy distance: Int) -> Index {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AdjacentPairsCollection will also need to implement index(_:offsetBy:limitedBy:) to properly fulfill the RandomAccessCollection requirements — the added complexity of a limit could make this quite tricky though, so absolutely feel free to leave it as a TODO or ask for guidance if it's significantly harder than the version without a limit 🙂

adjacentPairs() also is conceptually similar to windows(ofCount:) and chunks(ofCount:), so their Collection conformances could be useful to draw inspiration from.

Comment on lines +65 to +91
func testIndexDistance() {
let pairSequences = (0...4).map { (0..<$0).adjacentPairs() }

for pairs in pairSequences {
for index in pairs.indices.dropLast() {
let next = pairs.index(after: index)
XCTAssertEqual(pairs.distance(from: index, to: next), 1)
}

XCTAssertEqual(pairs.distance(from: pairs.startIndex, to: pairs.endIndex), pairs.count)
XCTAssertEqual(pairs.distance(from: pairs.endIndex, to: pairs.startIndex), -pairs.count)
}
}

func testIndexOffsetBy() {
let pairSequences = (0...4).map { (0..<$0).adjacentPairs() }

for pairs in pairSequences {
for index in pairs.indices.dropLast() {
let next = pairs.index(after: index)
XCTAssertEqual(pairs.index(index, offsetBy: 1), next)
}

XCTAssertEqual(pairs.index(pairs.startIndex, offsetBy: pairs.count), pairs.endIndex)
XCTAssertEqual(pairs.index(pairs.endIndex, offsetBy: -pairs.count), pairs.startIndex)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are made redundant by the validateIndexTraversals method that tests whether all index manipulation logic is well-behaved, in an exhaustive manner. Something like this probably covers all edge cases:

func testIndexTraversals() {
  validateIndexTraversals(
    (0..<0).adjacentPairs(),
    (0..<1).adjacentPairs(),
    (0..<2).adjacentPairs(),
    (0..<3).adjacentPairs(),
    (0..<10).adjacentPairs())
}


extension Collection {
fileprivate static func == <L: Equatable, R: Equatable> (lhs: Self, rhs: Self) -> Bool where Element == (L, R) {
lhs.count == rhs.count && zip(lhs, rhs).allSatisfy(==)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lhs.count == rhs.count && zip(lhs, rhs).allSatisfy(==)
lhs.elementsEqual(rhs, by: ==)

Comment on lines +192 to +200
@inlinable
public var endIndex: Index {
switch base.endIndex {
case startIndex.first, startIndex.second:
return startIndex
case let end:
return Index(first: end, second: end)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly suggest unconditionally representing endIndex as Index(first: base.endIndex, second: base.endIndex), and adapting startIndex to match this representation in the edge case that base.count == 1 (instead of the other way around). This is the approach we take in Windows as well. Having a consistent representation of endIndex often makes it easier to reason about index manipulation logic, and I think you'll find that it will improve some of your code.

Comment on lines +226 to +230
return i == endIndex
? Index(first: base.index(i.first, offsetBy: distance - 1),
second: base.index(i.first, offsetBy: distance))
: Index(first: base.index(i.first, offsetBy: distance),
second: i.first)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could avoid computing the first and second indices separately here, since you can cheaply compute the firts if you already have the second.

Comment on lines +184 to +187
@inlinable
public static func < (lhs: Index, rhs: Index) -> Bool {
(lhs.first, lhs.second) < (rhs.first, rhs.second)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should only need to compare one of the two underlying indices here instead of both, and the same applies to ==.

@natecook1000
Copy link
Member

@swift-ci Please test

@natecook1000
Copy link
Member

Going to merge now and follow up — connected with @mpangburn offline (thank you! 👏)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants