Introduce `adjacentPairs` #119

mpangburn · 2021-03-28T22:31:24Z

Implements an adjacentPairs algorithm on Sequence, per forum discussion: https://forums.swift.org/t/add-an-adjacentpairs-algorithm-to-sequence/14817.

Description

Lazily iterates over tuples of adjacent elements.

This operation is available for any sequence by calling the adjacentPairs() method.

let numbers = (1...5)
let pairs = numbers.adjacentPairs()
// Array(pairs) == [(1, 2), (2, 3), (3, 4), (4, 5)]

Detailed Design

The adjacentPairs() method is declared as a Sequence extension returning AdjacentPairs.

extension Sequence {
  public func adjacentPairs() -> AdjacentPairs<Self>
}

The resulting AdjacentPairs type is a sequence, with conditional conformance to Collection, BidirectionalCollection, and RandomAccessCollection when the underlying sequence conforms.

The spelling zip(s, s.dropFirst()) for a sequence s is an equivalent operation on collection types; however, this implementation is undefined behavior on single-pass sequences, and Zip2Sequence does not conditionally conform to the Collection family of protocols.

Documentation Plan

API introduced in AdjacentPairs.swift is documented.
AdjacentPairs.md is included in Guides.
README is updated.
CHANGELOG is updated.

Test Plan

Unit tests capture expected index-related invariants about Collection and verify expected behavior for sequences of varying lengths, including the empty and single-element sequence cases.

Source Impact

This change is strictly additive.

Checklist

I've added at least one test that validates that my change is working, if appropriate
I've followed the code style of the rest of the project
I've read the Contribution Guidelines
I've updated the documentation if necessary

ollieatkinson · 2021-03-29T13:35:02Z

We have something that can do this already - I wonder if we need a new collection for something we can easily compose:

import Algorithms

let numbers = (1...5)
let pairs = numbers.windows(ofCount: 2)
// [[1, 2], [2, 3], [3, 4], [4, 5]]

If you wanted to get your exact API for tuples you can write an extension re-using the windows(ofCount:) implementation:

extension Collection {
    
    func adjacentPairs() -> AnyCollection<(Element, Element)> {
        AnyCollection(windows(ofCount: 2).map {
            ($0[$0.startIndex], $0[$0.index(after: $0.startIndex)])
        })
    }
}

natecook1000 · 2021-03-29T14:53:02Z

@ollieatkinson While there's overlap between this and windows(ofCount:), adjacentPairs provides enough benefit that it can stand as its own, separate adapter. Primarily, the fact that it vends tuples as its element type simplifies a variety of pairwise operations — for example, you can test that a sequence is sorted by writing seq.adjacentPairs().allSatisfy(<=).

natecook1000 · 2021-03-29T16:12:56Z

One of the things we're finding as we build the wrappers in this package is that a collection's startIndex, endIndex, and the subscript really do need to be O(1), as documented (even though we haven't always adhered to that in the stdlib itself). When that isn't the case, combined operations can end up with unpredictable performance.

For this collection wrapper, that means that we'll need to compute startIndex when calling adjacentPairs() and store it as an instance property, instead of calculating the first pair of indices each time. That, in turn, will mean we need separate AdjacentPairsSequence and AdjacentPairsCollection types, and that the index for the collection type should wrap both indices instead of just the first one. With those changes, you'll be able to access the start/end index and use the subscript without needing to advance indices on the underlying collection.

p.s. Thanks so much for bringing this addition to the Algorithms package!

ollieatkinson · 2021-03-31T13:34:37Z

One of the things we're finding as we build the wrappers in this package is that a collection's startIndex, endIndex, and the subscript really do need to be O(1), as documented (even though we haven't always adhered to that in the stdlib itself). When that isn't the case, combined operations can end up with unpredictable performance.

This is a good call, and something which is worthy to be wary of - when implementing the sliding window algorithm it was one of the primary concerns that access to both startIndex and endIndex is O(1).

It's true that the current implementation of the sliding window algorithm stores the starting offset, I would be interested to know if the impact of that storage is causing trouble for many algorithms given the fantastic capacity of our devices today.

I am a little unsure to when we stop writing new wrappers vs using composition of existing types, and improving those to unlock new capabilities - for example, improving the sliding windows algorithms to address any of the concerns we would have to re-use in a implementation of adjacentPairs.

Having said all of this, for what it's worth, the implementation of adjacentPairs here looks good!

timvermeulen · 2021-03-31T16:35:19Z

It's true that the current implementation of the sliding window algorithm stores the starting offset, I would be interested to know if the impact of that storage is causing trouble for many algorithms given the fantastic capacity of our devices today.

The end user likely won't notice the difference in most scenarios, but in extreme cases it can make a meaningful difference (as described here).

I am a little unsure to when we stop writing new wrappers vs using composition of existing types, and improving those to unlock new capabilities - for example, improving the sliding windows algorithms to address any of the concerns we would have to re-use in a implementation of adjacentPairs.

We should certainly compose existing abstractions to make new ones whenever we reasonably can! And slidingWindows indeed almost gets us there. Unfortunately Sequence and Collection are sufficiently different from each other that in this case we're unable to create a Sequence method on top of one that's only available to Collection.

mpangburn · 2021-04-11T03:59:40Z

@natecook1000 Appreciate the pointers—I've made the following changes:

AdjacentPairsSequence and AdjacentPairsCollection are distinct types, and the adjacentPairs() extension method is overloaded appropriately.
For AdjacentPairsCollection, startIndex is precomputed, and AdjacentPairsCollection.Index stores both indices into the underlying collection.
AdjacentPairsSequence.Iterator no longer computes base.next() in init, deferring fetching the initial element until its own next method is called.
Inlining annotations have been added.

Let me know if there are any other improvements I can make.

timvermeulen · 2021-04-14T11:34:24Z

Sources/Algorithms/AdjacentPairs.swift

+extension AdjacentPairsCollection {
+  public typealias Iterator = AdjacentPairsSequence<Base>.Iterator
+
+  @inlinable
+  public func makeIterator() -> Iterator {
+    Iterator(base: base.makeIterator())
+  }
+}


I'm not sure if this is beneficial, mostly because we're precomputing startIndex. Won't this mean we end up doing some duplicate work when iterating over someCollection.adjacentPairs()?

timvermeulen · 2021-04-14T12:00:10Z

Sources/Algorithms/AdjacentPairs.swift

+  }
+
+  @inlinable
+  public func index(_ i: Index, offsetBy distance: Int) -> Index {


AdjacentPairsCollection will also need to implement index(_:offsetBy:limitedBy:) to properly fulfill the RandomAccessCollection requirements — the added complexity of a limit could make this quite tricky though, so absolutely feel free to leave it as a TODO or ask for guidance if it's significantly harder than the version without a limit 🙂

adjacentPairs() also is conceptually similar to windows(ofCount:) and chunks(ofCount:), so their Collection conformances could be useful to draw inspiration from.

timvermeulen · 2021-04-14T12:14:24Z

Tests/SwiftAlgorithmsTests/AdjacentPairsTests.swift

+  func testIndexDistance() {
+    let pairSequences = (0...4).map { (0..<$0).adjacentPairs() }
+
+    for pairs in pairSequences {
+      for index in pairs.indices.dropLast() {
+        let next = pairs.index(after: index)
+        XCTAssertEqual(pairs.distance(from: index, to: next), 1)
+      }
+
+      XCTAssertEqual(pairs.distance(from: pairs.startIndex, to: pairs.endIndex), pairs.count)
+      XCTAssertEqual(pairs.distance(from: pairs.endIndex, to: pairs.startIndex), -pairs.count)
+    }
+  }
+
+  func testIndexOffsetBy() {
+    let pairSequences = (0...4).map { (0..<$0).adjacentPairs() }
+
+    for pairs in pairSequences {
+      for index in pairs.indices.dropLast() {
+        let next = pairs.index(after: index)
+        XCTAssertEqual(pairs.index(index, offsetBy: 1), next)
+      }
+
+      XCTAssertEqual(pairs.index(pairs.startIndex, offsetBy: pairs.count), pairs.endIndex)
+      XCTAssertEqual(pairs.index(pairs.endIndex, offsetBy: -pairs.count), pairs.startIndex)
+    }
+  }


These tests are made redundant by the validateIndexTraversals method that tests whether all index manipulation logic is well-behaved, in an exhaustive manner. Something like this probably covers all edge cases:

func testIndexTraversals() { validateIndexTraversals( (0..<0).adjacentPairs(), (0..<1).adjacentPairs(), (0..<2).adjacentPairs(), (0..<3).adjacentPairs(), (0..<10).adjacentPairs()) }

timvermeulen · 2021-04-14T12:15:40Z

Tests/SwiftAlgorithmsTests/AdjacentPairsTests.swift

+
+extension Collection {
+  fileprivate static func == <L: Equatable, R: Equatable> (lhs: Self, rhs: Self) -> Bool where Element == (L, R) {
+    lhs.count == rhs.count && zip(lhs, rhs).allSatisfy(==)


Suggested change

lhs.count == rhs.count && zip(lhs, rhs).allSatisfy(==)

lhs.elementsEqual(rhs, by: ==)

timvermeulen · 2021-04-19T12:39:38Z

Sources/Algorithms/AdjacentPairs.swift

+  @inlinable
+  public var endIndex: Index {
+    switch base.endIndex {
+    case startIndex.first, startIndex.second:
+      return startIndex
+    case let end:
+      return Index(first: end, second: end)
+    }
+  }


I strongly suggest unconditionally representing endIndex as Index(first: base.endIndex, second: base.endIndex), and adapting startIndex to match this representation in the edge case that base.count == 1 (instead of the other way around). This is the approach we take in Windows as well. Having a consistent representation of endIndex often makes it easier to reason about index manipulation logic, and I think you'll find that it will improve some of your code.

timvermeulen · 2021-04-19T12:46:54Z

Sources/Algorithms/AdjacentPairs.swift

+      return i == endIndex
+        ? Index(first: base.index(i.first, offsetBy: distance - 1),
+                second: base.index(i.first, offsetBy: distance))
+        : Index(first: base.index(i.first, offsetBy: distance),
+                second: i.first)


You could avoid computing the first and second indices separately here, since you can cheaply compute the firts if you already have the second.

timvermeulen · 2021-04-19T12:48:54Z

Sources/Algorithms/AdjacentPairs.swift

+    @inlinable
+    public static func < (lhs: Index, rhs: Index) -> Bool {
+      (lhs.first, lhs.second) < (rhs.first, rhs.second)
+    }


You should only need to compare one of the two underlying indices here instead of both, and the same applies to ==.

natecook1000 · 2021-05-04T18:01:59Z

@swift-ci Please test

natecook1000 · 2021-05-04T18:08:25Z

Going to merge now and follow up — connected with @mpangburn offline (thank you! 👏)

breakfast added 2 commits March 28, 2021 15:18

Introduce adjacentPairs

fc97695

Update README and CHANGELOG

03e37de

Split sequence & collection types

b0f3020

mpangburn force-pushed the mpangburn/adjacent-pairs branch from 534881c to b0f3020 Compare April 11, 2021 03:53

Update documentation

e092ca0

timvermeulen reviewed Apr 19, 2021

View reviewed changes

natecook1000 merged commit 0f27d99 into apple:main May 4, 2021

natecook1000 mentioned this pull request May 5, 2021

Update AdjacentPairs implementations and tests #139

Merged

4 tasks

tevelee mentioned this pull request Mar 26, 2022

Adjacent Pairs apple/swift-async-algorithms#111

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce `adjacentPairs` #119

Introduce `adjacentPairs` #119

mpangburn commented Mar 28, 2021

ollieatkinson commented Mar 29, 2021 •

edited

Loading

natecook1000 commented Mar 29, 2021 •

edited

Loading

natecook1000 commented Mar 29, 2021 •

edited

Loading

ollieatkinson commented Mar 31, 2021

timvermeulen commented Mar 31, 2021

mpangburn commented Apr 11, 2021

timvermeulen Apr 14, 2021

timvermeulen Apr 14, 2021

timvermeulen Apr 14, 2021

timvermeulen Apr 14, 2021

timvermeulen Apr 19, 2021

timvermeulen Apr 19, 2021

timvermeulen Apr 19, 2021

natecook1000 commented May 4, 2021

natecook1000 commented May 4, 2021

	lhs.count == rhs.count && zip(lhs, rhs).allSatisfy(==)
	lhs.elementsEqual(rhs, by: ==)

Introduce adjacentPairs #119

Introduce adjacentPairs #119

Conversation

mpangburn commented Mar 28, 2021

Description

Detailed Design

Documentation Plan

Test Plan

Source Impact

Checklist

ollieatkinson commented Mar 29, 2021 • edited Loading

natecook1000 commented Mar 29, 2021 • edited Loading

natecook1000 commented Mar 29, 2021 • edited Loading

ollieatkinson commented Mar 31, 2021

timvermeulen commented Mar 31, 2021

mpangburn commented Apr 11, 2021

timvermeulen Apr 14, 2021

Choose a reason for hiding this comment

timvermeulen Apr 14, 2021

Choose a reason for hiding this comment

timvermeulen Apr 14, 2021

Choose a reason for hiding this comment

timvermeulen Apr 14, 2021

Choose a reason for hiding this comment

timvermeulen Apr 19, 2021

Choose a reason for hiding this comment

timvermeulen Apr 19, 2021

Choose a reason for hiding this comment

timvermeulen Apr 19, 2021

Choose a reason for hiding this comment

natecook1000 commented May 4, 2021

natecook1000 commented May 4, 2021

Introduce `adjacentPairs` #119

Introduce `adjacentPairs` #119

ollieatkinson commented Mar 29, 2021 •

edited

Loading

natecook1000 commented Mar 29, 2021 •

edited

Loading

natecook1000 commented Mar 29, 2021 •

edited

Loading