-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to make acquireSearcher lighter for can match #126140
base: main
Are you sure you want to change the base?
Changes from all commits
008a702
de0e25f
811f789
318523d
781ccb0
b5ba7e1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,10 +26,14 @@ | |
import org.apache.lucene.index.SegmentReader; | ||
import org.apache.lucene.index.Terms; | ||
import org.apache.lucene.index.TermsEnum; | ||
import org.apache.lucene.search.Collector; | ||
import org.apache.lucene.search.CollectorManager; | ||
import org.apache.lucene.search.IndexSearcher; | ||
import org.apache.lucene.search.Query; | ||
import org.apache.lucene.search.QueryCache; | ||
import org.apache.lucene.search.QueryCachingPolicy; | ||
import org.apache.lucene.search.ReferenceManager; | ||
import org.apache.lucene.search.Weight; | ||
import org.apache.lucene.search.similarities.Similarity; | ||
import org.apache.lucene.store.AlreadyClosedException; | ||
import org.apache.lucene.util.BytesRef; | ||
|
@@ -1553,14 +1557,23 @@ public Searcher( | |
QueryCachingPolicy queryCachingPolicy, | ||
Closeable onClose | ||
) { | ||
super(reader); | ||
// pass in an executor so we can shortcut and avoid looping through all segments | ||
super(reader, source.equals(CAN_MATCH_SEARCH_SOURCE) ? Runnable::run : null); | ||
setSimilarity(similarity); | ||
setQueryCache(queryCache); | ||
setQueryCachingPolicy(queryCachingPolicy); | ||
this.source = source; | ||
this.onClose = onClose; | ||
} | ||
|
||
private static final LeafSlice[] EMPTY_SLICES = new LeafSlice[0]; | ||
|
||
@Override | ||
protected LeafSlice[] slices(List<LeafReaderContext> leaves) { | ||
assert source.equals(CAN_MATCH_SEARCH_SOURCE); | ||
return EMPTY_SLICES; | ||
} | ||
|
||
/** | ||
* The source that caused this searcher to be acquired. | ||
*/ | ||
|
@@ -1586,6 +1599,19 @@ public void close() { | |
throw new AssertionError(e); | ||
} | ||
} | ||
|
||
@Override | ||
protected void searchLeaf(LeafReaderContext ctx, int minDocId, int maxDocId, Weight weight, Collector collector) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just curious - why did you add the assertion in this method? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. because I want to make sure that when we shortcut the slicing, searchLeaf is never called. That's the assumption based on which we can shortcut the slices to an empty array, because a search will never be executed on top of this searcher instance. |
||
throws IOException { | ||
assert source.equals(CAN_MATCH_SEARCH_SOURCE) == false; | ||
super.searchLeaf(ctx, minDocId, maxDocId, weight, collector); | ||
} | ||
|
||
@Override | ||
public <C extends Collector, T> T search(Query query, CollectorManager<C, T> collectorManager) throws IOException { | ||
assert source.equals(CAN_MATCH_SEARCH_SOURCE) == false; | ||
return super.search(query, collectorManager); | ||
} | ||
} | ||
|
||
public abstract static class Operation { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's many more scenarios where all we need is an index reader. This is quite hard to untangle due to the many acquireSearcher callers and different variants of the method, wrapping going on as well. I am tempted to add an acquireIndexReader method to Engine, but this requires significant cleanup to become effective and better than what we currently have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, if there is significant overhead on creating a searcher, perhaps that's something to fix in Lucene rather than working around in ES. What triggered this change is suspicions around overhead caused by creating slices eagerly, mostly stuff that happens in the IndexSearcher constructor. We should figure out how to minimize that in Lucene?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree and apache/lucene#14343 should help quite a bit on this front. I wonder though, maybe the problem isn't so much with the slice cost but with the fact that we don't really reuse searcher instances much. We don't really need a new searcher (for e.g. can_match) across queries so long as we don't see new data do we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is definitely a possible improvement, but what I am trying to do is figure out the cause of a regression. We have not changed how many times we create a searcher, but we did add overhead to the creation of a searcher.