Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: OpenSearch - reorganize test suite #1563

Merged
merged 6 commits into from
Mar 28, 2025

Conversation

anakin87
Copy link
Member

@anakin87 anakin87 commented Mar 25, 2025

Related Issues

While working on #1562, I realize that the OpenSearch test suite had become complex, disorganized, and difficult to manage.

Proposed Changes:

  • move all document store fixtures to conftest.py, document their purpose, unify sync and async versions, and remove a redundant fixture
  • move authorization-related tests from test_document_store.py to test_auth.py
  • move filter-related tests from test_document_store.py to test_filters.py
  • move async tests from test_document_store.py to test_document_store_async.py (new)

How did you test it?

CI

Notes for the reviewer

This PR is based on the branch of #1562, so please review but do not merge until #1562 has been merged.

Checklist

@github-actions github-actions bot added integration:opensearch type:documentation Improvements or additions to documentation labels Mar 25, 2025
@anakin87 anakin87 marked this pull request as ready for review March 25, 2025 16:35
@anakin87 anakin87 requested a review from a team as a code owner March 25, 2025 16:35
@anakin87 anakin87 requested review from julian-risch and removed request for a team March 25, 2025 16:35
Base automatically changed from opensearch-apply-return-embedding-to-filter-documents to main March 26, 2025 10:14
Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quite good to me already. Maybe it's out of scope for this reorganization of the test suite but I noticed that we repeatedly define the same list of documents to write them to document stores. Let's quickly discuss before merging this PR.

  • test_bm25_retrieval_with_filters
  • test_bm25_retrieval_with_custom_query
  • test_bm25_retrieval_with_custom_query_empty_filters
    use the same list of documents.

Similarly the following tests have the same documents in common:
test_bm25_retrieval_all_terms_must_match_false
test_bm25_retrieval_with_fuzziness
test_bm25_retrieval_all_terms_must_match
test_bm25_retrieval

What do you think about removing duplicate lines of code by defining these lists of documents only once in conftest.py or even a document store fixture that already has all these documents indexed. The latter could be an option because the tests listed above only read from the document store. However, it would at some complexity and speed up is negligible so I'd prefer the the first option of simply defining two lists of documents in conftest.py. Should be as simple as adding the following to that file, right?:

@pytest.fixture()
def test_documents():
    return [
        Document(content="Haskell is a functional programming language"),
        Document(content="Lisp is a functional programming language"),
        Document(content="Exilir is a functional programming language"),
        Document(content="F# is a functional programming language"),
        Document(content="C# is a functional programming language"),
        Document(content="C++ is an object oriented programming language"),
        Document(content="Dart is an object oriented programming language"),
        Document(content="Go is an object oriented programming language"),
        Document(content="Python is a object oriented programming language"),
        Document(content="Ruby is a object oriented programming language"),
        Document(content="PHP is a object oriented programming language"),
    ]

What do you think?

@anakin87
Copy link
Member Author

@julian-risch great idea...
I applied it in bc573d2, and now we are removing a lot of code!

@anakin87 anakin87 requested a review from julian-risch March 28, 2025 09:25
Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍 1100+ lines of code removed. 💪

@anakin87 anakin87 merged commit c5bdff7 into main Mar 28, 2025
6 checks passed
@anakin87 anakin87 deleted the opensearch-refactor-tests-suite branch March 28, 2025 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration:opensearch type:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants