Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: parametrize CoW indexing tests for extension (nullable) dtypes #51208

Conversation

jorisvandenbossche
Copy link
Member

Experimenting with one possible approach to have the Copy-on-Write tests cover more block types / dtypes (cfr #51144 (comment)).

In this PR I parametrized with numpy vs nullable dtype, but that should cover ExtensionDtype in general (for internal coverage, we should probably also have tests for datetime-like data, but that would be a much bigger change to include in the current tests).

Doing this PR unveiled that we did have different copy/view rules for selecting multiple columns if they are extension dtypes. The fix for that is currently included here (otherwise a lot of test would either fail or need to be adapted), but I have a separate PR focusing on that actual change: #51197

@jorisvandenbossche jorisvandenbossche added Testing pandas testing functions or related to the test suite Copy / view semantics labels Feb 7, 2023
ser = Series(*args, **kwargs)
return ser.convert_dtypes().copy()

return request.param, make_dataframe, make_series
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I experimented locally with some different ways to parametrize this, and settled on the above with a fixture that returns adapted constructor functions. Alternative was some "postprocess" function that was called on each dataframe/series that was created. Actually parametrizing on dtype objects seems more complicated.

Side note: inside the test I am assigning those factory functions to DataFrame and Series to avoid more changes to the test code, but that can also be a different name (like the make_dataframe/make_series used above) to be more explicit. The reason I currently preferred this somewhat magic override, is that this gives a smaller diff, and this keeps the test code more copy-pastable (and I think the idea is that we have an option that controls the behaviour of the constructors in the future, so that would give less changes then).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is easier to follow than I expected, so looks good generally

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments

@jorisvandenbossche jorisvandenbossche added this to the 2.0 milestone Feb 16, 2023
@jorisvandenbossche jorisvandenbossche marked this pull request as ready for review February 16, 2023 07:44
Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird test failure, otherwise lgtm.

Will merge main to be sure that it's related

for mgr_loc in mgr_locs:
newblk = blk.copy(deep=not only_slice)
newblk = blk.copy(deep=deep)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this have caused test_reindex_copies_ea to fail? Not sure but can't see how this test could have been impacted otherwise

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, that's the test that I added in the precursor PR. With CoW we now don't copy (the test was added in the assumption we always copy with copy=True, while now we don't copy in case of CoW)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, for most other methods I think we are actually still honoring the copy=True keyword .. Should we do that here as well for reindex with EAs?

(although we should also stop honoring the copy=True keyword at some point, when CoW is enabled, xref #50535)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would prefer not honouring copy anymore for CoW. I'll start tracking the cases down where we are still using copy.

phofl and others added 2 commits February 16, 2023 15:18

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@jorisvandenbossche jorisvandenbossche merged commit d35646f into pandas-dev:main Feb 20, 2023
@jorisvandenbossche jorisvandenbossche deleted the cow-tests-parametrize-nullable branch February 20, 2023 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Copy / view semantics Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants