Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to make acquireSearcher lighter for can match #126140

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 27 additions & 1 deletion server/src/main/java/org/elasticsearch/index/engine/Engine.java
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,14 @@
import org.apache.lucene.index.SegmentReader;
import org.apache.lucene.index.Terms;
import org.apache.lucene.index.TermsEnum;
import org.apache.lucene.search.Collector;
import org.apache.lucene.search.CollectorManager;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.QueryCache;
import org.apache.lucene.search.QueryCachingPolicy;
import org.apache.lucene.search.ReferenceManager;
import org.apache.lucene.search.Weight;
import org.apache.lucene.search.similarities.Similarity;
import org.apache.lucene.store.AlreadyClosedException;
import org.apache.lucene.util.BytesRef;
Expand Down Expand Up @@ -1553,14 +1557,23 @@ public Searcher(
QueryCachingPolicy queryCachingPolicy,
Closeable onClose
) {
super(reader);
// pass in an executor so we can shortcut and avoid looping through all segments
super(reader, source.equals(CAN_MATCH_SEARCH_SOURCE) ? Runnable::run : null);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's many more scenarios where all we need is an index reader. This is quite hard to untangle due to the many acquireSearcher callers and different variants of the method, wrapping going on as well. I am tempted to add an acquireIndexReader method to Engine, but this requires significant cleanup to become effective and better than what we currently have.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, if there is significant overhead on creating a searcher, perhaps that's something to fix in Lucene rather than working around in ES. What triggered this change is suspicions around overhead caused by creating slices eagerly, mostly stuff that happens in the IndexSearcher constructor. We should figure out how to minimize that in Lucene?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and apache/lucene#14343 should help quite a bit on this front. I wonder though, maybe the problem isn't so much with the slice cost but with the fact that we don't really reuse searcher instances much. We don't really need a new searcher (for e.g. can_match) across queries so long as we don't see new data do we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is definitely a possible improvement, but what I am trying to do is figure out the cause of a regression. We have not changed how many times we create a searcher, but we did add overhead to the creation of a searcher.

setSimilarity(similarity);
setQueryCache(queryCache);
setQueryCachingPolicy(queryCachingPolicy);
this.source = source;
this.onClose = onClose;
}

private static final LeafSlice[] EMPTY_SLICES = new LeafSlice[0];

@Override
protected LeafSlice[] slices(List<LeafReaderContext> leaves) {
assert source.equals(CAN_MATCH_SEARCH_SOURCE);
return EMPTY_SLICES;
}

/**
* The source that caused this searcher to be acquired.
*/
Expand All @@ -1586,6 +1599,19 @@ public void close() {
throw new AssertionError(e);
}
}

@Override
protected void searchLeaf(LeafReaderContext ctx, int minDocId, int maxDocId, Weight weight, Collector collector)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious - why did you add the assertion in this method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I want to make sure that when we shortcut the slicing, searchLeaf is never called. That's the assumption based on which we can shortcut the slices to an empty array, because a search will never be executed on top of this searcher instance.

throws IOException {
assert source.equals(CAN_MATCH_SEARCH_SOURCE) == false;
super.searchLeaf(ctx, minDocId, maxDocId, weight, collector);
}

@Override
public <C extends Collector, T> T search(Query query, CollectorManager<C, T> collectorManager) throws IOException {
assert source.equals(CAN_MATCH_SEARCH_SOURCE) == false;
return super.search(query, collectorManager);
}
}

public abstract static class Operation {
Expand Down