Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce adjacentPairs #119

Merged
merged 4 commits into from
May 4, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ package updates, you can specify your package dependency using

## [Unreleased]

*No changes yet.*
-`adjacentPairs()` lazily iterates over tuples of adjacent elements of a sequence.

---

Expand Down
52 changes: 52 additions & 0 deletions Guides/AdjacentPairs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# AdjacentPairs

[[Source](https://github.com/apple/swift-algorithms/blob/main/Sources/Algorithms/AdjacentPairs.swift) |
[Tests](https://github.com/apple/swift-algorithms/blob/main/Tests/SwiftAlgorithmsTests/AdjacentPairsTests.swift)]

Lazily iterates over tuples of adjacent elements.

This operation is available for any sequence by calling the `adjacentPairs()` method.

```swift
let numbers = (1...5)
let pairs = numbers.adjacentPairs()
// Array(pairs) == [(1, 2), (2, 3), (3, 4), (4, 5)]
```

## Detailed Design

The `adjacentPairs()` method is declared as a `Sequence` extension returning `AdjacentPairsSequence` and as a `Collection` extension returning `AdjacentPairsCollection`.

```swift
extension Sequence {
public func adjacentPairs() -> AdjacentPairsSequence<Self>
}
```

```swift
extension Collection {
public func adjacentPairs() -> AdjacentPairsCollection<Self>
}
```

The `AdjacentPairsSequence` type is a sequence, and the `AdjacentPairsCollection` type is a collection with conditional conformance to `BidirectionalCollection` and `RandomAccessCollection` when the underlying collection conforms.

### Complexity

Calling `adjacentPairs` is an O(1) operation.

### Naming

This method is named for clarity while remaining agnostic to any particular domain of programming. In natural language processing, this operation is akin to computing a list of bigrams; however, this algorithm is not specific to this use case.

[naming]: https://forums.swift.org/t/naming-of-chained-with/40999/

### Comparison with other languages

This function is often written as a `zip` of a sequence together with itself, minus its first element.

**Haskell:** This operation is spelled ``s `zip` tail s``.

**Python:** Python users may write `zip(s, s[1:])` for a list with at least one element. For natural language processing, the `nltk` package offers a `bigrams` function akin to this method.

Note that in Swift, the spelling `zip(s, s.dropFirst())` is undefined behavior for a single-pass sequence `s`.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ Read more about the package, and the intent behind it, in the [announcement on s

#### Other useful operations

- [`adjacentPairs()`](https://github.com/apple/swift-algorithms/blob/main/Guides/AdjacentPairs.md): Lazily iterates over tuples of adjacent elements.
- [`chunked(by:)`, `chunked(on:)`, `chunks(ofCount:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Chunked.md): Eager and lazy operations that break a collection into chunks based on either a binary predicate or when the result of a projection changes or chunks of a given count.
- [`indexed()`](https://github.com/apple/swift-algorithms/blob/main/Guides/Indexed.md): Iterate over tuples of a collection's indices and elements.
- [`interspersed(with:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Intersperse.md): Place a value between every two elements of a sequence.
Expand Down
271 changes: 271 additions & 0 deletions Sources/Algorithms/AdjacentPairs.swift
Original file line number Diff line number Diff line change
@@ -0,0 +1,271 @@
//===----------------------------------------------------------------------===//
//
// This source file is part of the Swift Algorithms open source project
//
// Copyright (c) 2021 Apple Inc. and the Swift project authors
// Licensed under Apache License v2.0 with Runtime Library Exception
//
// See https://swift.org/LICENSE.txt for license information
//
//===----------------------------------------------------------------------===//

extension Sequence {
/// Creates a sequence of adjacent pairs of elements from this sequence.
///
/// In the `AdjacentPairsSequence` returned by this method, the elements of
/// the *i*th pair are the *i*th and *(i+1)*th elements of the underlying
/// sequence.
/// The following example uses the `adjacentPairs()` method to iterate over
/// adjacent pairs of integers:
///
/// for pair in (1...5).adjacentPairs() {
/// print(pair)
/// }
/// // Prints "(1, 2)"
/// // Prints "(2, 3)"
/// // Prints "(3, 4)"
/// // Prints "(4, 5)"
@inlinable
public func adjacentPairs() -> AdjacentPairsSequence<Self> {
AdjacentPairsSequence(base: self)
}
}

extension Collection {
/// A collection of adjacent pairs of elements built from an underlying collection.
///
/// In an `AdjacentPairsCollection`, the elements of the *i*th pair are the *i*th
/// and *(i+1)*th elements of the underlying sequence. The following example
/// uses the `adjacentPairs()` method to iterate over adjacent pairs of
/// integers:
/// ```
/// for pair in (1...5).adjacentPairs() {
/// print(pair)
/// }
/// // Prints "(1, 2)"
/// // Prints "(2, 3)"
/// // Prints "(3, 4)"
/// // Prints "(4, 5)"
/// ```
@inlinable
public func adjacentPairs() -> AdjacentPairsCollection<Self> {
AdjacentPairsCollection(base: self)
}
}

/// A sequence of adjacent pairs of elements built from an underlying sequence.
///
/// In an `AdjacentPairsSequence`, the elements of the *i*th pair are the *i*th
/// and *(i+1)*th elements of the underlying sequence. The following example
/// uses the `adjacentPairs()` method to iterate over adjacent pairs of
/// integers:
/// ```
/// for pair in (1...5).adjacentPairs() {
/// print(pair)
/// }
/// // Prints "(1, 2)"
/// // Prints "(2, 3)"
/// // Prints "(3, 4)"
/// // Prints "(4, 5)"
/// ```
public struct AdjacentPairsSequence<Base: Sequence> {
@usableFromInline
internal let base: Base

/// Creates an instance that makes pairs of adjacent elements from `base`.
@inlinable
internal init(base: Base) {
self.base = base
}
}

extension AdjacentPairsSequence {
public struct Iterator {
@usableFromInline
internal var base: Base.Iterator

@usableFromInline
internal var previousElement: Base.Element?

@inlinable
internal init(base: Base.Iterator) {
self.base = base
}
}
}

extension AdjacentPairsSequence.Iterator: IteratorProtocol {
public typealias Element = (Base.Element, Base.Element)

@inlinable
public mutating func next() -> Element? {
if previousElement == nil {
previousElement = base.next()
}

guard let previous = previousElement, let next = base.next() else {
return nil
}

previousElement = next
return (previous, next)
}
}

extension AdjacentPairsSequence: Sequence {
@inlinable
public func makeIterator() -> Iterator {
Iterator(base: base.makeIterator())
}

@inlinable
public var underestimatedCount: Int {
Swift.max(0, base.underestimatedCount - 1)
}
}

/// A collection of adjacent pairs of elements built from an underlying collection.
///
/// In an `AdjacentPairsCollection`, the elements of the *i*th pair are the *i*th
/// and *(i+1)*th elements of the underlying sequence. The following example
/// uses the `adjacentPairs()` method to iterate over adjacent pairs of
/// integers:
/// ```
/// for pair in (1...5).adjacentPairs() {
/// print(pair)
/// }
/// // Prints "(1, 2)"
/// // Prints "(2, 3)"
/// // Prints "(3, 4)"
/// // Prints "(4, 5)"
/// ```
public struct AdjacentPairsCollection<Base: Collection> {
@usableFromInline
internal let base: Base

public let startIndex: Index

@inlinable
internal init(base: Base) {
self.base = base

// Precompute `startIndex` to ensure O(1) behavior,
// avoiding indexing past `endIndex`
let start = base.startIndex
let end = base.endIndex
let second = start == end ? start : base.index(after: start)
self.startIndex = Index(first: start, second: second)
}
}

extension AdjacentPairsCollection {
public typealias Iterator = AdjacentPairsSequence<Base>.Iterator

@inlinable
public func makeIterator() -> Iterator {
Iterator(base: base.makeIterator())
}
}
Comment on lines +161 to +168
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is beneficial, mostly because we're precomputing startIndex. Won't this mean we end up doing some duplicate work when iterating over someCollection.adjacentPairs()?


extension AdjacentPairsCollection {
public struct Index: Comparable {
@usableFromInline
internal var first: Base.Index

@usableFromInline
internal var second: Base.Index

@inlinable
internal init(first: Base.Index, second: Base.Index) {
self.first = first
self.second = second
}

@inlinable
public static func < (lhs: Index, rhs: Index) -> Bool {
(lhs.first, lhs.second) < (rhs.first, rhs.second)
}
Comment on lines +184 to +187
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should only need to compare one of the two underlying indices here instead of both, and the same applies to ==.

}
}

extension AdjacentPairsCollection: Collection {
@inlinable
public var endIndex: Index {
switch base.endIndex {
case startIndex.first, startIndex.second:
return startIndex
case let end:
return Index(first: end, second: end)
}
}
Comment on lines +192 to +200
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly suggest unconditionally representing endIndex as Index(first: base.endIndex, second: base.endIndex), and adapting startIndex to match this representation in the edge case that base.count == 1 (instead of the other way around). This is the approach we take in Windows as well. Having a consistent representation of endIndex often makes it easier to reason about index manipulation logic, and I think you'll find that it will improve some of your code.


@inlinable
public subscript(position: Index) -> (Base.Element, Base.Element) {
(base[position.first], base[position.second])
}

@inlinable
public func index(after i: Index) -> Index {
let next = base.index(after: i.second)
return next == base.endIndex
? endIndex
: Index(first: i.second, second: next)
}

@inlinable
public func index(_ i: Index, offsetBy distance: Int) -> Index {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AdjacentPairsCollection will also need to implement index(_:offsetBy:limitedBy:) to properly fulfill the RandomAccessCollection requirements — the added complexity of a limit could make this quite tricky though, so absolutely feel free to leave it as a TODO or ask for guidance if it's significantly harder than the version without a limit 🙂

adjacentPairs() also is conceptually similar to windows(ofCount:) and chunks(ofCount:), so their Collection conformances could be useful to draw inspiration from.

if distance == 0 {
return i
} else if distance > 0 {
let firstOffsetIndex = base.index(i.first, offsetBy: distance)
let secondOffsetIndex = base.index(after: firstOffsetIndex)
return secondOffsetIndex == base.endIndex
? endIndex
: Index(first: firstOffsetIndex, second: secondOffsetIndex)
} else {
return i == endIndex
? Index(first: base.index(i.first, offsetBy: distance - 1),
second: base.index(i.first, offsetBy: distance))
: Index(first: base.index(i.first, offsetBy: distance),
second: i.first)
Comment on lines +226 to +230
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could avoid computing the first and second indices separately here, since you can cheaply compute the firts if you already have the second.

}
}

@inlinable
public func distance(from start: Index, to end: Index) -> Int {
let offset: Int
switch (start.first, end.first) {
case (base.endIndex, base.endIndex):
return 0
case (base.endIndex, _):
offset = +1
case (_, base.endIndex):
offset = -1
default:
offset = 0
}

return base.distance(from: start.first, to: end.first) + offset
}

@inlinable
public var count: Int {
Swift.max(0, base.count - 1)
}
}

extension AdjacentPairsCollection: BidirectionalCollection
where Base: BidirectionalCollection
{
@inlinable
public func index(before i: Index) -> Index {
i == endIndex
? Index(first: base.index(i.first, offsetBy: -2),
second: base.index(before: i.first))
: Index(first: base.index(before: i.first),
second: i.first)
}
}

extension AdjacentPairsCollection: RandomAccessCollection
where Base: RandomAccessCollection {}
Loading