Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-4356: Add key function support to the bisect module #20556

Merged
merged 7 commits into from
Oct 20, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 90 additions & 28 deletions Doc/library/bisect.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ example of the algorithm (the boundary conditions are already right!).
The following functions are provided:


.. function:: bisect_left(a, x, lo=0, hi=len(a))
.. function:: bisect_left(a, x, lo=0, hi=len(a), *, key=None)

Locate the insertion point for *x* in *a* to maintain sorted order.
The parameters *lo* and *hi* may be used to specify a subset of the list
Expand All @@ -31,39 +31,106 @@ The following functions are provided:
parameter to ``list.insert()`` assuming that *a* is already sorted.

The returned insertion point *i* partitions the array *a* into two halves so
that ``all(val < x for val in a[lo:i])`` for the left side and
``all(val >= x for val in a[i:hi])`` for the right side.
that ``all(val < x for val in a[lo : i])`` for the left side and
``all(val >= x for val in a[i : hi])`` for the right side.

.. function:: bisect_right(a, x, lo=0, hi=len(a))
*key* specifies a :term:`key function` of one argument that is used to
extract a comparison key from each input element. The default value is
``None`` (compare the elements directly).

.. versionchanged:: 3.10
Added the *key* parameter.


.. function:: bisect_right(a, x, lo=0, hi=len(a), *, key=None)
bisect(a, x, lo=0, hi=len(a))

Similar to :func:`bisect_left`, but returns an insertion point which comes
after (to the right of) any existing entries of *x* in *a*.

The returned insertion point *i* partitions the array *a* into two halves so
that ``all(val <= x for val in a[lo:i])`` for the left side and
``all(val > x for val in a[i:hi])`` for the right side.
that ``all(val <= x for val in a[lo : i])`` for the left side and
``all(val > x for val in a[i : hi])`` for the right side.

*key* specifies a :term:`key function` of one argument that is used to
extract a comparison key from each input element. The default value is
``None`` (compare the elements directly).

.. versionchanged:: 3.10
Added the *key* parameter.


.. function:: insort_left(a, x, lo=0, hi=len(a))
.. function:: insort_left(a, x, lo=0, hi=len(a), *, key=None)

Insert *x* in *a* in sorted order. This is equivalent to
``a.insert(bisect.bisect_left(a, x, lo, hi), x)`` assuming that *a* is
already sorted. Keep in mind that the O(log n) search is dominated by
the slow O(n) insertion step.
Insert *x* in *a* in sorted order.

.. function:: insort_right(a, x, lo=0, hi=len(a))
*key* specifies a :term:`key function` of one argument that is used to
extract a comparison key from each input element. The default value is
``None`` (compare the elements directly).

This function first runs :func:`bisect_left` to locate an insertion point.
Next, it runs the :meth:`insert` method on *a* to insert *x* at the
appropriate position to maintain sort order.

Keep in mind that the ``O(log n)`` search is dominated by the slow O(n)
insertion step.

.. versionchanged:: 3.10
Added the *key* parameter.


.. function:: insort_right(a, x, lo=0, hi=len(a), *, key=None)
insort(a, x, lo=0, hi=len(a))

Similar to :func:`insort_left`, but inserting *x* in *a* after any existing
entries of *x*.

*key* specifies a :term:`key function` of one argument that is used to
extract a comparison key from each input element. The default value is
``None`` (compare the elements directly).

This function first runs :func:`bisect_right` to locate an insertion point.
Next, it runs the :meth:`insert` method on *a* to insert *x* at the
appropriate position to maintain sort order.

Keep in mind that the ``O(log n)`` search is dominated by the slow O(n)
insertion step.

.. versionchanged:: 3.10
Added the *key* parameter.


Performance Notes
-----------------

When writing time sensitive code using *bisect()* and *insort()*, keep these
thoughts in mind:

* Bisection is effective for searching ranges of values.
For locating specific values, dictionaries are more performant.

* The *insort()* functions are ``O(n)`` because the logarithmic search step
is dominated by the linear time insertion step.

* The search functions are stateless and discard key function results after
they are used. Consequently, if the search functions are used in a loop,
the key function may be called again and again on the same array elements.
If the key function isn't fast, consider wrapping it with
:func:`functools.cache` to avoid duplicate computations. Alternatively,
consider searching an array of precomputed keys to locate the insertion
point (as shown in the examples section below).

.. seealso::

`SortedCollection recipe
<https://code.activestate.com/recipes/577197-sortedcollection/>`_ that uses
bisect to build a full-featured collection class with straight-forward search
methods and support for a key-function. The keys are precomputed to save
unnecessary calls to the key function during searches.
* `Sorted Collections
<http://www.grantjenks.com/docs/sortedcollections/>`_ is a high performance
module that uses *bisect* to managed sorted collections of data.

* The `SortedCollection recipe
<https://code.activestate.com/recipes/577197-sortedcollection/>`_ uses
bisect to build a full-featured collection class with straight-forward search
methods and support for a key-function. The keys are precomputed to save
unnecessary calls to the key function during searches.


Searching Sorted Lists
Expand Down Expand Up @@ -110,8 +177,8 @@ lists::
raise ValueError


Other Examples
--------------
Examples
--------

.. _bisect-example:

Expand All @@ -127,17 +194,12 @@ a 'B', and so on::
>>> [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]
['F', 'A', 'C', 'C', 'B', 'A', 'A']

Unlike the :func:`sorted` function, it does not make sense for the :func:`bisect`
functions to have *key* or *reversed* arguments because that would lead to an
inefficient design (successive calls to bisect functions would not "remember"
all of the previous key lookups).

Instead, it is better to search a list of precomputed keys to find the index
of the record in question::
One technique to avoid repeated calls to a key function is to search a list of
precomputed keys to find the index of a record::

>>> data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
>>> data.sort(key=lambda r: r[1])
>>> keys = [r[1] for r in data] # precomputed list of keys
>>> data.sort(key=lambda r: r[1]) # Or use operator.itemgetter(1).
>>> keys = [r[1] for r in data] # Precompute a list of keys.
>>> data[bisect_left(keys, 0)]
('black', 0)
>>> data[bisect_left(keys, 1)]
Expand Down
2 changes: 0 additions & 2 deletions Doc/tools/susp-ignored.csv
Original file line number Diff line number Diff line change
Expand Up @@ -111,8 +111,6 @@ howto/urllib2,,:password,"""joe:[email protected]"""
library/ast,,:upper,lower:upper
library/ast,,:step,lower:upper:step
library/audioop,,:ipos,"# factor = audioop.findfactor(in_test[ipos*2:ipos*2+len(out_test)],"
library/bisect,32,:hi,all(val >= x for val in a[i:hi])
library/bisect,42,:hi,all(val > x for val in a[i:hi])
library/configparser,,:home,my_dir: ${Common:home_dir}/twosheds
library/configparser,,:option,${section:option}
library/configparser,,:path,python_dir: ${Frameworks:path}/Python/Versions/${Frameworks:Python}
Expand Down
66 changes: 48 additions & 18 deletions Lib/bisect.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,22 @@
"""Bisection algorithms."""

def insort_right(a, x, lo=0, hi=None):

def insort_right(a, x, lo=0, hi=None, *, key=None):
"""Insert item x in list a, and keep it sorted assuming a is sorted.

If x is already in a, insert it to the right of the rightmost x.

Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""

lo = bisect_right(a, x, lo, hi)
if key is None:
lo = bisect_right(a, x, lo, hi)
else:
lo = bisect_right(a, key(x), lo, hi, key=key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find suspicious that key(x) is called here and that it's not handled by bisect_right(), e.g. I would have expected this to work:

>>> from bisect import bisect_left
>>> scores = [('Alice', 30), ('Bob', 20), ('Charles', 10)]
>>> bisect_left(scores, ('David', 25), key=lambda s: -s[1])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'int' and 'tuple'

Is this on purpose?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall the discussion in the other PR whether the "searched for item" should be an argument to key or a value produced by key.

It's a bit clearer in the case of insort_* functions because array member is needed for insertion, thus, key will be applied to this value, thus the latter above.

Perhaps it's easier to disambiguate in the docs by mentioning how the key will be used, and/or talk about type or shape of key and array members?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the API is correct in that it supports typical usage patterns. insort() will insert entire records and bisect() will scan the key field for the first match in a range at or above the key value.

In SQL, the pattern would look like this:

CREATE INDEX age_ndx INDEX ON People (age);
INSERT INTO People Values ("raymond", 50);
/* Find the youngest person at or over 42 */
SELECT name, age FROM People WHERE age >= 42 ORDER BY age LIMIT 1;

In Python, the example would look like this:

from bisect import bisect, insort
from operator import attrgetter
from collections import namedtuple
from pprint import pp

Person = namedtuple('Person', ('name', 'age'))

people = [
    Person('tom', 30),
    Person('sue', 40),
    Person('mark', 35),
    Person('peter', 55),
]

people.sort(key=attrgetter('age'))
new_person = Person('randy', 50)
insort(people, new_person, key=attrgetter('age'))
pp(people)
print(people[bisect(people, 42, key=attrgetter('age'))])

If bisect() required an entire record as an input, it would preclude the ability to do searches like like this.

Here's another example that we would want to support:

# https://www.quickenloans.com/blog/federal-income-tax-brackets
Bracket = namedtuple('Bracket', ('rate', 'single', 'married_joint', 'married_sep', 'head'))
brackets_2019 = [
    Bracket(0.10, 9_700, 19_400, 9_700, 13_850),
    Bracket(0.12, 39_475, 78_950, 39_475, 52_850),
    Bracket(0.22, 84_200, 168_400, 84_200, 84_200),
    Bracket(0.24, 160_725, 321_450, 160_725, 160_700),
]
taxable_income = 60_000
for status in ('single', 'married_joint', 'married_sep', 'head'):
    i = bisect(brackets_2019, taxable_income, key=attrgetter(status))
    bracket = brackets_2019[i]
    print(status, bracket.rate)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll echo what Raymond explained. It would be more useful to a library like sortedcontainers if the argument to bisect was the field value rather than the record. After all, the key function can be applied to the record to get the field value if need be. And there are many cases where constructing the record may be expensive or impossible just to determine rank within a list. The field value API is compatible with having the record and key function but not the other way around.

The SortedKeyList data type provides both bisect_left and bisect_key_left variants. The bisect_left variant simply applies the key function and calls bisect_key_left so I think the API specifying the field value is more generically useful.

a.insert(lo, x)

def bisect_right(a, x, lo=0, hi=None):

def bisect_right(a, x, lo=0, hi=None, *, key=None):
"""Return the index where to insert item x in list a, assuming a is sorted.

The return value i is such that all e in a[:i] have e <= x, and all e in
Expand All @@ -27,14 +31,26 @@ def bisect_right(a, x, lo=0, hi=None):
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
# Use __lt__ to match the logic in list.sort() and in heapq
if x < a[mid]: hi = mid
else: lo = mid+1
# Note, the comparison uses "<" to match the
# __lt__() logic in list.sort() and in heapq.
if key is None:
while lo < hi:
mid = (lo + hi) // 2
if x < a[mid]:
hi = mid
else:
lo = mid + 1
else:
while lo < hi:
mid = (lo + hi) // 2
if x < key(a[mid]):
hi = mid
else:
lo = mid + 1
return lo

def insort_left(a, x, lo=0, hi=None):

def insort_left(a, x, lo=0, hi=None, *, key=None):
"""Insert item x in list a, and keep it sorted assuming a is sorted.

If x is already in a, insert it to the left of the leftmost x.
Expand All @@ -43,11 +59,13 @@ def insort_left(a, x, lo=0, hi=None):
slice of a to be searched.
"""

lo = bisect_left(a, x, lo, hi)
if key is None:
lo = bisect_left(a, x, lo, hi)
else:
lo = bisect_left(a, key(x), lo, hi, key=key)
a.insert(lo, x)


def bisect_left(a, x, lo=0, hi=None):
def bisect_left(a, x, lo=0, hi=None, *, key=None):
"""Return the index where to insert item x in list a, assuming a is sorted.

The return value i is such that all e in a[:i] have e < x, and all e in
Expand All @@ -62,13 +80,25 @@ def bisect_left(a, x, lo=0, hi=None):
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
# Use __lt__ to match the logic in list.sort() and in heapq
if a[mid] < x: lo = mid+1
else: hi = mid
# Note, the comparison uses "<" to match the
# __lt__() logic in list.sort() and in heapq.
if key is None:
while lo < hi:
mid = (lo + hi) // 2
if a[mid] < x:
lo = mid + 1
else:
hi = mid
else:
while lo < hi:
mid = (lo + hi) // 2
if key(a[mid]) < x:
lo = mid + 1
else:
hi = mid
return lo


# Overwrite above definitions with a fast C implementation
try:
from _bisect import *
Expand Down
57 changes: 57 additions & 0 deletions Lib/test/test_bisect.py
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,63 @@ def test_keyword_args(self):
self.module.insort(a=data, x=25, lo=1, hi=3)
self.assertEqual(data, [10, 20, 25, 25, 25, 30, 40, 50])

def test_lookups_with_key_function(self):
mod = self.module

# Invariant: Index with a keyfunc on an array
# should match the index on an array where
# key function has already been applied.

keyfunc = abs
arr = sorted([2, -4, 6, 8, -10], key=keyfunc)
precomputed_arr = list(map(keyfunc, arr))
for x in precomputed_arr:
self.assertEqual(
mod.bisect_left(arr, x, key=keyfunc),
mod.bisect_left(precomputed_arr, x)
)
self.assertEqual(
mod.bisect_right(arr, x, key=keyfunc),
mod.bisect_right(precomputed_arr, x)
)

keyfunc = str.casefold
arr = sorted('aBcDeEfgHhiIiij', key=keyfunc)
precomputed_arr = list(map(keyfunc, arr))
for x in precomputed_arr:
self.assertEqual(
mod.bisect_left(arr, x, key=keyfunc),
mod.bisect_left(precomputed_arr, x)
)
self.assertEqual(
mod.bisect_right(arr, x, key=keyfunc),
mod.bisect_right(precomputed_arr, x)
)

def test_insort(self):
from random import shuffle
mod = self.module

# Invariant: As random elements are inserted in
# a target list, the targetlist remains sorted.
keyfunc = abs
data = list(range(-10, 11)) + list(range(-20, 20, 2))
shuffle(data)
target = []
for x in data:
mod.insort_left(target, x, key=keyfunc)
self.assertEqual(
sorted(target, key=keyfunc),
target
)
target = []
for x in data:
mod.insort_right(target, x, key=keyfunc)
self.assertEqual(
sorted(target, key=keyfunc),
target
)

class TestBisectPython(TestBisect, unittest.TestCase):
module = py_bisect

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add a key function to the bisect module.
Loading