Skip to content

Update the similarity algorithm details #9675

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

prudhvigodithi
Copy link
Member

Description

Update the similarity algorithm details

Issues Resolved

Closed #9643
Related OpenSearch issue: opensearch-project/OpenSearch#17315

Version

3.0.0

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.

@prudhvigodithi
Copy link
Member Author

Adding @msfroh @getsaurabh02 to the PR.
Thanks

`boolean` | Assigns terms a score equal to their boost value. Use `boolean` similarity when you want the document scores to be based on the binary value of whether the terms match.

### Note

OpenSearch 3.0 and later defaults to Lucene’s `BM25Similarity`. If required, you can opt in to `LegacyBM25` explicitly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would highlight the difference.

Lucene recognized that scores were being multiplied by a constant factor and dropped it.

In OpenSearch 3.0, by default, all scores will be lower by a factor of 2.2 (if I remember correctly). It doesn't affect the order of results, unless you do something that depends on the specific values.

Examples include the min_score query that discards everything below a given score. It may also have implications for hybrid scoring (though I believe that's usually relative to the max score, so the relative scaling should be consistent).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

[DOC] Update the similarity to reflect LegacyBM25
3 participants