Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup slice calculation in IndexSearcher #14343

Conversation

original-brownbear
Copy link
Member

It's in the title, some obvious speedups. This is fairly expensive logic for Elasticsearch when run over a larger number of shards. No need for streams, creating comparator instances and so much copying here.

It's in the title, some obvious speedups. This is fairly expensive logic
for Elasticsearch when run over a larger number of shards.
No need for streams, creating comparator instances and so much copying here.
Copy link
Contributor

@msfroh msfroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup! Cutting down on the number of unnecessary allocations is nice.

I wonder if it makes sense to require that the input to the LeafSlice constructor is already sorted? In general, the way that they get created (even with segment partitions) could be sorted externally. That said, for the common existing case where each slice has one partition, the sort will fast-return anyway.

@original-brownbear
Copy link
Member Author

Thanks Michael!

@original-brownbear original-brownbear merged commit bdf4def into apache:main Mar 12, 2025
7 checks passed
original-brownbear added a commit that referenced this pull request Mar 12, 2025
It's in the title, some obvious speedups. This is fairly expensive logic
for Elasticsearch when run over a larger number of shards.
No need for streams, creating comparator instances and so much copying here.
hanbj pushed a commit to hanbj/lucene that referenced this pull request Mar 12, 2025
It's in the title, some obvious speedups. This is fairly expensive logic
for Elasticsearch when run over a larger number of shards.
No need for streams, creating comparator instances and so much copying here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants