-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to make acquireSearcher lighter for can match #126140
base: main
Are you sure you want to change the base?
Conversation
@@ -1553,14 +1557,23 @@ public Searcher( | |||
QueryCachingPolicy queryCachingPolicy, | |||
Closeable onClose | |||
) { | |||
super(reader); | |||
// pass in an executor so we can shortcut and avoid looping through all segments | |||
super(reader, source.equals(CAN_MATCH_SEARCH_SOURCE) ? Runnable::run : null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's many more scenarios where all we need is an index reader. This is quite hard to untangle due to the many acquireSearcher callers and different variants of the method, wrapping going on as well. I am tempted to add an acquireIndexReader method to Engine, but this requires significant cleanup to become effective and better than what we currently have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, if there is significant overhead on creating a searcher, perhaps that's something to fix in Lucene rather than working around in ES. What triggered this change is suspicions around overhead caused by creating slices eagerly, mostly stuff that happens in the IndexSearcher constructor. We should figure out how to minimize that in Lucene?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree and apache/lucene#14343 should help quite a bit on this front. I wonder though, maybe the problem isn't so much with the slice cost but with the fact that we don't really reuse searcher instances much. We don't really need a new searcher (for e.g. can_match) across queries so long as we don't see new data do we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is definitely a possible improvement, but what I am trying to do is figure out the cause of a regression. We have not changed how many times we create a searcher, but we did add overhead to the creation of a searcher.
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks!
There seems to be some overhead observed in some of our benchmarks around creating an
IndexSearcher
. In many places, like can match, we acquire a searcher but all we need is an index reader. This small change tries to verify some assumptions by removing the slight slicing overhead caused by looping over the segments when we create a searcher without providing an executor to its constructor. We may iterate on this approach alter, but first I'd like to see the impact of this on our nightly benchmarks.