-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: parametrize CoW indexing tests for extension (nullable) dtypes #51208
TST: parametrize CoW indexing tests for extension (nullable) dtypes #51208
Conversation
ser = Series(*args, **kwargs) | ||
return ser.convert_dtypes().copy() | ||
|
||
return request.param, make_dataframe, make_series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I experimented locally with some different ways to parametrize this, and settled on the above with a fixture that returns adapted constructor functions. Alternative was some "postprocess" function that was called on each dataframe/series that was created. Actually parametrizing on dtype objects seems more complicated.
Side note: inside the test I am assigning those factory functions to DataFrame
and Series
to avoid more changes to the test code, but that can also be a different name (like the make_dataframe
/make_series
used above) to be more explicit. The reason I currently preferred this somewhat magic override, is that this gives a smaller diff, and this keeps the test code more copy-pastable (and I think the idea is that we have an option that controls the behaviour of the constructors in the future, so that would give less changes then).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is easier to follow than I expected, so looks good generally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird test failure, otherwise lgtm.
Will merge main to be sure that it's related
for mgr_loc in mgr_locs: | ||
newblk = blk.copy(deep=not only_slice) | ||
newblk = blk.copy(deep=deep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this have caused test_reindex_copies_ea
to fail? Not sure but can't see how this test could have been impacted otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, that's the test that I added in the precursor PR. With CoW we now don't copy (the test was added in the assumption we always copy with copy=True, while now we don't copy in case of CoW)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, for most other methods I think we are actually still honoring the copy=True
keyword .. Should we do that here as well for reindex with EAs?
(although we should also stop honoring the copy=True
keyword at some point, when CoW is enabled, xref #50535)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I would prefer not honouring copy anymore for CoW. I'll start tracking the cases down where we are still using copy.
Experimenting with one possible approach to have the Copy-on-Write tests cover more block types / dtypes (cfr #51144 (comment)).
In this PR I parametrized with numpy vs nullable dtype, but that should cover ExtensionDtype in general (for internal coverage, we should probably also have tests for datetime-like data, but that would be a much bigger change to include in the current tests).
Doing this PR unveiled that we did have different copy/view rules for selecting multiple columns if they are extension dtypes. The fix for that is currently included here (otherwise a lot of test would either fail or need to be adapted), but I have a separate PR focusing on that actual change: #51197