Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'partition_at_index/_by/_by_key' for slices. #55448

Merged
merged 1 commit into from
Apr 3, 2019

Conversation

Mokosha
Copy link
Contributor

@Mokosha Mokosha commented Oct 28, 2018

This is an analog to C++'s std::nth_element (a.k.a. quickselect).

Corresponds to tracking bug #55300.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @bluss (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 28, 2018
@TimNN
Copy link
Contributor

TimNN commented Nov 6, 2018

Ping from triage @bluss / @rust-lang/libs: This PR requires your review.

@SimonSapin
Copy link
Contributor

@Mokosha, can you say more about why this is useful? What’s an example of a situation where you’d use this?

@rust-lang/libs, thoughts on having this feature in the standard library? It’s apparently already available on crates.io rust-lang/rfcs#1470 (comment)

@abonander
Copy link
Contributor

abonander commented Nov 6, 2018

@SimonSapin I have two separate use-cases in my own projects, both involving selecting the median of a dataset.

In the former I implemented quickselect myself because I couldn't find any crate implementing quickselect or any other kth-element algorithm when I wrote the code back in 2016. I didn't discover order-stat until much later because it didn't have the keywords I was searching for.

That's a major issue with saying "just use something from crates.io", sometimes people don't know what they're actually looking for or what keywords to search to find it. This is a fundamental enough operation that it should reside in the stdlib, where people are most likely to look first for solutions.

In the latter because the dataset is a small, constant size and I needed to copy it to preserve the original ordering (potentially not necessary, I already see a possible optimization), I decided to just use the stdlib's sort and index the set directly.

In both cases I would prefer a stdlib-provided implementation if it was available. In both cases it would probably be more optimal than the solutions I ended up going with, and importing an external crate just for one function at one use site in each case seemed like overkill (though in img_hash's case I wanted to try implementing it myself anyway).

@alexcrichton
Copy link
Member

This seems reasonable to me to add, pending bikeshedding the name/returns/etc

@Mark-Simulacrum
Copy link
Member

Mark-Simulacrum commented Nov 15, 2018

@Mokosha It looks like there's been a few review comments, can you take a look at those?

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 15, 2018
@Mokosha
Copy link
Contributor Author

Mokosha commented Nov 16, 2018

@Mokosha It looks like there's been a few review comments, can you take a look at those?

Yes. My apologies. I will take a look at these over the weekend.

@Mokosha
Copy link
Contributor Author

Mokosha commented Nov 27, 2018

I haven't tested the changes, as I'm running into an issue building rust_llvm. Based on the fact that there isn't a bug filed for it yet, I'm willing to bet it has to do with my setup. Making sure first before feeling confident about this PR, but I felt like I should address the comments here in the meantime.

@Mokosha
Copy link
Contributor Author

Mokosha commented Dec 2, 2018

All tests pass. :) Waiting for additional feedback.

@abonander
Copy link
Contributor

@Mokosha they'll want the commits to be squashed before they merge. I don't have review permissions.

@Mokosha
Copy link
Contributor Author

Mokosha commented Dec 6, 2018

Is this something I'm in charge of? (Do I create a new PR that squashes the commits?) I thought that was an option when doing the merge...

@abonander
Copy link
Contributor

abonander commented Dec 6, 2018

Merges aren't done directly through Github, I don't think bors will squash.

You can do an interactive rebase against the parent commit of your first one, mark all subsequent commits as "fixup" and then force-push to your branch when the rebase is complete. The change in commits will be immediately reflected on the PR.

@bluss
Copy link
Member

bluss commented Dec 9, 2018

This seems good! The name doesn't feel intuitive to me (but not the operation either, so maybe I haven't used it anywhere). Name was discussed in rust-lang/rfcs/issues/1470. I think I understand this operation better as partition_at_nth or similar than as a sort. Any drawback I missed with that name?

@Mokosha
Copy link
Contributor Author

Mokosha commented Dec 9, 2018

I don't know which name is the most intuitive, honestly. The reason I didn't choose anything with partition was because it doesn't convey the fact that the list is somewhat sorted around the point that's being partitioned. I would expect partition_at_nth to do the same thing that split_at does. If people prefer a different name, though, I don't have any strong feelings about it (within reason 😉).

@Mark-Simulacrum
Copy link
Member

Perhaps sort_until? If I understand this correctly, it sorts the first N elements -- so either sort_until or perhaps sort_first would make sense?

@Mokosha
Copy link
Contributor Author

Mokosha commented Dec 9, 2018

@Mark-Simulacrum It doesn't quite sort the first N elements. Rather, it chooses an index i of the slice, and makes sure that:

  • The element at index i is the same as if the slice was sorted.
  • All elements at positions [0, i) are "less than" the element at index i, in no particular order.
  • All elements at positions [i + 1, n) are "greater than" the element at index i, in no particular order.

@Mark-Simulacrum
Copy link
Member

Ah, okay. Then I think the partition naming is probably the best that I can think of -- and seems to jive okay with the other partition method on slices (https://doc.rust-lang.org/nightly/std/primitive.slice.html#method.partition_dedup).

If we do rename to the partition naming though I think changing the return type to (&mut [T], &mut T, &mut [T]) might be a good idea to provide immediate access to those three "parts" of the array. If the user doesn't want that, they can just not use the return type -- it should be essentially free to give it to them. It could also provide a good way of explaining the API.

@Mokosha
Copy link
Contributor Author

Mokosha commented Dec 11, 2018

OK -- good point about consistency. How about partition_at_index ?

@kennytm kennytm added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Mar 11, 2019
@RalfJung
Copy link
Member

RalfJung commented Mar 14, 2019

Looks like there's another config that doesn't support entropy (miri)-- are there suggestions for how to have a more robust test, or should I add the same cfg to the test here, too?

Would there be anything lost by using a fixed seed instead? IMO non-deterministic tests are an antipattern anyway.

If you use something like rand::rngs::StdRng::seed_from_u64(0xdeadcafe) or rand::StdRng::from_seed(&[0xdeadcafe]), then it'll work fine in Miri.

@Mokosha
Copy link
Contributor Author

Mokosha commented Mar 15, 2019

@RalfJung Probably not, but I'm matching the code already written in the test for sort_unstable above it. Seems like they should both be changed?

@RalfJung
Copy link
Member

IMO they should, but that's up to @rust-lang/libs

@Centril
Copy link
Contributor

Centril commented Mar 30, 2019

Ping from triage, @scottmcm

@scottmcm
Copy link
Member

scottmcm commented Apr 2, 2019

@bors r=bluss

@bors
Copy link
Contributor

bors commented Apr 2, 2019

📌 Commit 3f306db has been approved by bluss

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 2, 2019
@bors
Copy link
Contributor

bors commented Apr 2, 2019

⌛ Testing commit 3f306db with merge 8be24b37d498dbf7b85aef82b1007522a4b73cbc...

@bors
Copy link
Contributor

bors commented Apr 2, 2019

💥 Test timed out

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Apr 2, 2019
@kennytm
Copy link
Member

kennytm commented Apr 2, 2019

@bors retry

Double scheduling in two Travis builds.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 2, 2019
Centril added a commit to Centril/rust that referenced this pull request Apr 2, 2019
Add 'partition_at_index/_by/_by_key' for slices.

This is an analog to C++'s std::nth_element (a.k.a. quickselect).

Corresponds to tracking bug rust-lang#55300.
Centril added a commit to Centril/rust that referenced this pull request Apr 3, 2019
Add 'partition_at_index/_by/_by_key' for slices.

This is an analog to C++'s std::nth_element (a.k.a. quickselect).

Corresponds to tracking bug rust-lang#55300.
bors added a commit that referenced this pull request Apr 3, 2019
Rollup of 4 pull requests

Successful merges:

 - #55448 (Add 'partition_at_index/_by/_by_key' for slices.)
 - #59186 (improve worst-case performance of BTreeSet intersection v3)
 - #59514 (Remove adt_def from projections and downcasts in MIR)
 - #59630 (Shrink `mir::Statement`.)

Failed merges:

r? @ghost
@bors bors merged commit 3f306db into rust-lang:master Apr 3, 2019
@Mokosha
Copy link
Contributor Author

Mokosha commented Apr 3, 2019

Thank you to everyone who helped review this change and get it merged!

@Mokosha Mokosha deleted the SortAtIndex branch July 19, 2020 06:25
@Mokosha Mokosha restored the SortAtIndex branch July 19, 2020 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.