Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify the logic of script sort parallel collection #124639

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

KunjueYu
Copy link

This pr introduced a simpler way to support parallel collection for script sort.
The concurrency issue of script sort parallel collection was fixed in Fix concurrency issue in ScriptSortBuilder, which introduced a concurrentMap to save the relationship between leafContext and leafScript, so that the correct leafScript can be obtained later. However, we could fix the concurrency issue in another way, which may get rid of the map and simplify the code.

Before Fix concurrency issue in ScriptSortBuilder, the leaf script was a member of anonymous subClass of BytesRefFieldComparatorSource or DoubleValuesComparatorSource, so that it's shared across the shard and is not thread-safe.
If we create a named subClass of BytesRefFieldComparatorSource or DoubleValuesComparatorSource, we can put the declaration of leaf script member in the FieldComparator (for BytesRefFieldComparatorSource) or LeafFieldComparator (for DoubleValuesComparatorSource). So that different comparator could hold a different reference of leaf script.
In this pr, the leaf Comparator isn't wrapped as Fix concurrency issue in ScriptSortBuilder did because in each slice, the leafs are searched sequentially. So that we could set leaf script as a member of the FieldComparator, and the leaf script will be assigned to a new values for each leaf context and consumed later in the setScorer method, which works the same way just as before.

Copy link

cla-checker-service bot commented Mar 12, 2025

💚 CLA has been signed

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Mar 12, 2025
@KunjueYu KunjueYu changed the title Script sort support parallel collection Simplify the logic of script sort parallel collection Mar 12, 2025
@KunjueYu KunjueYu force-pushed the script_sort_support_parallel_collection branch from 0a58652 to fd0e227 Compare March 12, 2025 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external-contributor Pull request authored by a developer outside the Elasticsearch team needs:triage Requires assignment of a team area label v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants