-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: CoW and explicit copy
keyword in DataFrame/Series methods
#50535
Comments
If we keep the no-CoW mode, then we should keep the copy keyword, and add it where relevant xref #48141. Ideally we'll end up with just one mode.
I'm ambivalent here. The other reasonable alternative would be to for
Better to tell them to do .copy() instead. |
Yeah agreed, I'd prefer one mode as well. Should make things significantly easier to test. Also agree with getting rid of the keyword in this case. If we decide to do that, I would also prefer copy=True meaning a lazy copy as well, makes the transition easier when we remove it altogether |
Yes, I am also assuming that eventually we only have a single copy/view mode.
And so basically already start to ignore the Note: #51464 is doing exactly that. |
In general almost all DataFrame and Series methods return new data and thus make a copy if needed (if there was no calculation / data didn't change). But some methods allow you to avoid making this copy with an explicit
copy
keyword, which defaults tocopy=True
, but which you can change tocopy=False
manually to avoid the copy.Example:
Now, if Copy-on-Write is enabled, the above behaviour shouldn't happen (because we are updating one dataframe (
df
) through changing another dataframe (df3
)).In this specific case of
rename
, it actually already doesn't work anymore like that, anddf
is not updated:This is because of how it is implemented under the hood in
rename
, usingresult = self.copy(deep=copy)
, and so this always was already taking a shallow copy of the calling dataframe. With CoW enabled, a "shallow" copy doesn't exist anymore in the original meaning, but now essentially is a "lazy copy with CoW".But for some other methods, this is actually not yet working
There are several issues/questions here:
copy=False
now actually meaning a "lazy" copy for all those methods?copy
keyword.copy=True
should still give an actual hard / "eager" copy?copy
keyword?copy=True
, and so people will typically mostly use it explicitly to setcopy=False
. Butcopy=False
will become the default in the future, and so will not be needed anymore to specify explicitly.copy=True
in the future to ensure they get a "eager" copy (and not delay the copy / trigger a copy later on). But is that use case worth it to keep the keyword around? (they can always do.copy()
instead)DataFrame/Series methods that have a
copy
keyword (except for the constructors):align
astype
infer_objects
merge
reindex
reindex_like
rename
rename_axis
set_axis
(only added in 1.5)set_flags
(default False)swapaxes
swaplevel
to_numy
(default False)to_timestamp
transpose
(default False)truncate
tz_convert
tz_localize
pd.concat
xref CoW overview issue: #48998
The text was updated successfully, but these errors were encountered: