Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-4356: Add key parameter to functions in bisect module #11781

Closed

Conversation

remilapeyre
Copy link
Contributor

@remilapeyre remilapeyre commented Feb 7, 2019

@rhettinger rhettinger self-assigned this Feb 7, 2019
@remilapeyre
Copy link
Contributor Author

Hi Raymond, thanks for your comment.

I may be missing something but I'm not convinced the key function will get called multiple time per value.

I did some research before implementing this and I think your first point comes from the implementation of the key parameter in sorted() where the result of key on each element of the iterable is cached to avoid computing multiple time. ISTM that this is only necessary because the worst case complexity for sorting is n*ln(n) and a given element will get compared to multiple others to find its place in the resulting collection.

In binary search, most elements are not touched and it element e that is compared to the input x is only touched once so key(e) should only be computed once. As I commented in the code, I cached the result of key(x) so it would not be computed at each iteration. After all, each comparison done are against the same value (key(x)) so it would be wasteful to do them more than once, whether we use an auxiliary function or not.

About your second point, I think you say this because of the branching in the hot path. I did some tests before posting the pull request. The performance seems to be the same (I'm not sure this is a good to measure it thought, I would love some input on that):

➜  cpython git:(add-key-argument-to-bisect) python3 -m timeit -s "import bisect" "bisect.bisect(range(1_000_000_000_000_000), 25)"
50000 loops, best of 5: 5.23 usec per loop
➜  cpython git:(add-key-argument-to-bisect) ./python.exe -m timeit -s "import bisect" "bisect.bisect(range(1_000_000_000_000_000), 25)"
50000 loops, best of 5: 4.74 usec per loop
➜  cpython git:(add-key-argument-to-bisect) ./python.exe -m timeit -s "import bisect" "bisect.bisect(range(1_000_000_000_000_000), 25, key=lambda e: e)"
20000 loops, best of 5: 10 usec per loop

I guess the branch predictor does a good job here (?) and why no change of performance is seen (does someone know a good reference on branch predictors, out-of-order execution and other low-level performance details? I would like to learn more about them).

If I'm not making any mistake, the key argument can be safely be added here and the sorted collection is not necessary.

Am I missing the point completely?

@remilapeyre
Copy link
Contributor Author

Here's an example where key would break if called twice with the same object:

import bisect
from collections import defaultdict


class Test:
    def __init__(self, value):
        self.value = value


cache = defaultdict(int)

def key(e):
    cache[e] += 1
    assert cache[e] <= 1
    return e.value


l = [Test(i) for i in range(10000)]

bisect.bisect(l, Test(25), key=key)

It seems not to be an issue:

➜  cpython git:(add-key-argument-to-bisect) ./python.exe
Python 3.8.0a1+ (heads/add-key-argument-to-bisect:b7aaa1adad, Feb  7 2019, 17:33:24) 
[Clang 10.0.0 (clang-1000.10.44.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import bisect
>>> from collections import defaultdict
>>> 
>>> 
>>> class Test:
...     def __init__(self, value):
...         self.value = value
... 
>>> 
>>> cache = defaultdict(int)
>>> 
>>> def key(e):
...     cache[e] += 1
...     assert cache[e] <= 1
...     return e.value
... 
>>> 
>>> l = [Test(i) for i in range(10000)]
>>> 
>>> bisect.bisect(l, Test(25), key=key)
26

@alexchamberlain
Copy link

I found this whilst composing a message to Python-ideas about adding a key argument to bisect - just wanted to say I think this is a great idea. At the very least, this makes bisect consistent with the sort methods, but more importantly, it allows you to customise the ordering of objects on the fly.

I had also considered asking for a custom "comparator" argument, but quickly realised key is more consistent and you can convert a comparator to a key with a simple class implementing __lt__ (see functools.cmp_to_key.

@remilapeyre
Copy link
Contributor Author

CC @rhettinger

@remilapeyre
Copy link
Contributor Author

Thanks for the feedback @dimaqq!

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Copy link
Contributor

@rhettinger rhettinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key argument should be keyword only.

@remilapeyre
Copy link
Contributor Author

Thanks for the review, I have made the requested changes; please review again.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@rhettinger: please review the changes made to this pull request.

@rhettinger
Copy link
Contributor

Apologies, I had made my own PR 20556 before remembering that this one existed.

We should reconcile the two — each has tests the other doesn't, also there are some doc improvements in each that aren't in the other.

Am having second thoughts about including reversed because it doubles the complexity of the code, the tests, and spills over into documentation complexity. What do you think, save reversed for another day or put it now?

@csabella
Copy link
Contributor

Closing in favor of #20556.

@csabella csabella closed this Aug 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants