Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TYP: remove NaTType as possible result of Timestamp and Timedelta constructor #46171

Merged
merged 20 commits into from
Mar 8, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 26 additions & 56 deletions pandas/_libs/tslibs/nattype.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,10 @@ from datetime import (
timedelta,
tzinfo as _tzinfo,
)
from typing import Any
from typing import (
Any,
Union,
)

import numpy as np

Expand All @@ -15,7 +18,12 @@ nat_strings: set[str]

def is_null_datetimelike(val: object, inat_is_null: bool = ...) -> bool: ...

class NaTType(datetime):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NaTType inherits from datetime. Removing the inheritance because it removes a few mypy ignores doesn't sound like a good idea to me. If there is a reason why NaTType shouldn't inherit from datetime, the implantation should change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think NaTType should inherit from datetime, but maybe that's a separate PR? My reasoning why it should get removed is as follows. With the current code, the following happens:

>>> type(pd.Timestamp(np.nan))
<class 'pandas._libs.tslibs.nattype.NaTType'>
>>> type(pd.Timedelta(np.nan))
<class 'pandas._libs.tslibs.nattype.NaTType'>
True

So here we are saying that the Timedelta constructor can produce an object of type NaTType which is also a datetime, but datetime and timedelta are different in the python standard library.

From a static typing perspective, it is messing up a number of different things, and I don't think we get any benefit from that inheritance with respect to typing.

Furthermore, as best as I can tell, the NaTType implementation just overrides the entire datetime API, so we are not getting any implementation benefits from explicitly stating the inheritance. But I could be missing something here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NaTType inherits from datetime. Removing the inheritance because it removes a few mypy ignores doesn't sound like a good idea to me. If there is a reason why NaTType shouldn't inherit from datetime, the implantation should change.

While in the nattype.pyx part of the code, we indicate this inheritance, from a typing perspective, it's not really the case because there are methods in datetime that are not implemented in NaTType. We even have a test to check that: pandas/tests/scalar/test_nat.py:test_missing_public_nat_methods()

With static typing, the assumption is that a class that is a subclass of the parent has all the methods of the parent. Our implementation of NaTType doesn't do that, so we need to tell the type system that NaTType is not a subclass of datetime

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't changing the annotation without changing the implementation mess up static inference on e.g.:

if isinstance(obj, datetime):
    return 0
else:
    assert False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

methods in datetime that are not implemented in NaTType. We even have a test to check that: pandas/tests/scalar/test_nat.py:test_missing_public_nat_methods()

I'm really looking forward to #24983 (which might take some time).

If Timestamp/Timedelta/NaTType were the last pieces to get pandas to be a py.typed library, I would fully agree to purposefully make the two large changes proposed in this PR (not inherit from datetime and simplifying __new__) - the benefits for users would (from my perspective) outweigh the discrepancy between implementation and typing. With the current state of Pandas it is more difficult to judge: I'm not against this PR but also not in favor of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't changing the annotation without changing the implementation mess up static inference on e.g.:

if isinstance(obj, datetime):
    return 0
else:
    assert False

It really depends on how you are handling pd.NaT in your user code, whether you are talking about Timestamp versus Timedelta, etc.

As pandas users, my staff and I live in the Timestamp/Timedelta world. We use pd.to_datetime() to convert strings to dates and times and get Timestamp objects as a result. The relationships of those types to datetime.datetime and datetime.timedelta doesn't matter. So I wouldn't write code like that example. Besides, we have this kind of inconsistency to worry about, independent of typing:

>>> isinstance(pd.Timestamp("NAN"), datetime.datetime)
True
>>> isinstance(pd.Timestamp("NAN"), pd.Timestamp)
False

Seems kind of odd that you ask for a Timestamp, which is a subclass of datetime, and the result is True to ask if it is a datetime, but is False when asking if it is a Timestamp

So the point is that the implementation has issues - that's maybe what #24983 would take care of, but until then we should type it in a way that makes sense from a user perspective.

For what it's worth, we don't document that Timestamp is a subclass of datetime.datetime nor do we even have a documentation page about pd.NaT at all that indicates anything about its type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, we don't document that Timestamp is a subclass of datetime.datetime

There's a note about it here in the API reference but probably easy to miss: https://pandas.pydata.org/pandas-docs/stable/reference/arrays.html#datetime-data

_NaTComparisonTypes = Union[datetime, timedelta, Period, np.datetime64, np.timedelta64]

class _NatComparison:
def __call__(self, other: _NaTComparisonTypes) -> bool: ...
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed because when you compare to np.datetime64 and np.timedelta64, their comparison operators are declared this way. mypy complains when just having typing stubs (in the Microsoft stubs I'm working on), and this resolves the issue there and here in pandas.


class NaTType:
value: np.int64
def asm8(self) -> np.datetime64: ...
def to_datetime64(self) -> np.datetime64: ...
Expand Down Expand Up @@ -54,26 +62,19 @@ class NaTType(datetime):
def weekofyear(self) -> float: ...
def day_name(self) -> float: ...
def month_name(self) -> float: ...
# error: Return type "float" of "weekday" incompatible with return
# type "int" in supertype "date"
def weekday(self) -> float: ... # type: ignore[override]
# error: Return type "float" of "isoweekday" incompatible with return
# type "int" in supertype "date"
def isoweekday(self) -> float: ... # type: ignore[override]
def weekday(self) -> float: ...
def isoweekday(self) -> float: ...
def total_seconds(self) -> float: ...
# error: Signature of "today" incompatible with supertype "datetime"
def today(self, *args, **kwargs) -> NaTType: ... # type: ignore[override]
# error: Signature of "today" incompatible with supertype "datetime"
def now(self, *args, **kwargs) -> NaTType: ... # type: ignore[override]
def today(self, *args, **kwargs) -> NaTType: ...
def now(self, *args, **kwargs) -> NaTType: ...
def to_pydatetime(self) -> NaTType: ...
def date(self) -> NaTType: ...
def round(self) -> NaTType: ...
def floor(self) -> NaTType: ...
def ceil(self) -> NaTType: ...
def tz_convert(self) -> NaTType: ...
def tz_localize(self) -> NaTType: ...
# error: Signature of "replace" incompatible with supertype "datetime"
def replace( # type: ignore[override]
def replace(
self,
year: int | None = ...,
month: int | None = ...,
Expand All @@ -86,38 +87,24 @@ class NaTType(datetime):
tzinfo: _tzinfo | None = ...,
fold: int | None = ...,
) -> NaTType: ...
# error: Return type "float" of "year" incompatible with return
# type "int" in supertype "date"
@property
def year(self) -> float: ... # type: ignore[override]
def year(self) -> float: ...
@property
def quarter(self) -> float: ...
# error: Return type "float" of "month" incompatible with return
# type "int" in supertype "date"
@property
def month(self) -> float: ... # type: ignore[override]
# error: Return type "float" of "day" incompatible with return
# type "int" in supertype "date"
def month(self) -> float: ...
@property
def day(self) -> float: ... # type: ignore[override]
# error: Return type "float" of "hour" incompatible with return
# type "int" in supertype "date"
def day(self) -> float: ...
@property
def hour(self) -> float: ... # type: ignore[override]
# error: Return type "float" of "minute" incompatible with return
# type "int" in supertype "date"
def hour(self) -> float: ...
@property
def minute(self) -> float: ... # type: ignore[override]
# error: Return type "float" of "second" incompatible with return
# type "int" in supertype "date"
def minute(self) -> float: ...
@property
def second(self) -> float: ... # type: ignore[override]
def second(self) -> float: ...
@property
def millisecond(self) -> float: ...
# error: Return type "float" of "microsecond" incompatible with return
# type "int" in supertype "date"
@property
def microsecond(self) -> float: ... # type: ignore[override]
def microsecond(self) -> float: ...
@property
def nanosecond(self) -> float: ...
# inject Timedelta properties
Expand All @@ -132,24 +119,7 @@ class NaTType(datetime):
def qyear(self) -> float: ...
def __eq__(self, other: Any) -> bool: ...
def __ne__(self, other: Any) -> bool: ...
# https://github.com/python/mypy/issues/9015
# error: Argument 1 of "__lt__" is incompatible with supertype "date";
# supertype defines the argument type as "date"
def __lt__( # type: ignore[override]
self, other: datetime | timedelta | Period | np.datetime64 | np.timedelta64
) -> bool: ...
# error: Argument 1 of "__le__" is incompatible with supertype "date";
# supertype defines the argument type as "date"
def __le__( # type: ignore[override]
self, other: datetime | timedelta | Period | np.datetime64 | np.timedelta64
) -> bool: ...
# error: Argument 1 of "__gt__" is incompatible with supertype "date";
# supertype defines the argument type as "date"
def __gt__( # type: ignore[override]
self, other: datetime | timedelta | Period | np.datetime64 | np.timedelta64
) -> bool: ...
# error: Argument 1 of "__ge__" is incompatible with supertype "date";
# supertype defines the argument type as "date"
def __ge__( # type: ignore[override]
self, other: datetime | timedelta | Period | np.datetime64 | np.timedelta64
) -> bool: ...
__lt__: _NatComparison
__le__: _NatComparison
__gt__: _NatComparison
__ge__: _NatComparison
13 changes: 7 additions & 6 deletions pandas/_libs/tslibs/timedeltas.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ from typing import (
)

import numpy as np
import numpy.typing as npt

from pandas._libs.tslibs import (
NaTType,
Tick,
)
from pandas._typing import npt

_S = TypeVar("_S", bound=timedelta)

Expand All @@ -26,21 +26,22 @@ def array_to_timedelta64(
errors: str = ...,
) -> np.ndarray: ... # np.ndarray[m8ns]
def parse_timedelta_unit(unit: str | None) -> str: ...
def delta_to_nanoseconds(delta: Tick | np.timedelta64 | timedelta | int) -> int: ...
def delta_to_nanoseconds(delta: np.timedelta64 | timedelta | Tick) -> int: ...

class Timedelta(timedelta):
min: ClassVar[Timedelta]
max: ClassVar[Timedelta]
resolution: ClassVar[Timedelta]
value: int # np.int64

# error: "__new__" must return a class instance (got "Union[Timedelta, NaTType]")
def __new__( # type: ignore[misc]
def __new__(
cls: Type[_S],
value=...,
unit: str = ...,
**kwargs: int | float | np.integer | np.floating,
) -> _S | NaTType: ...
) -> _S: ...
# GH 46171
# While Timedelta can return pd.NaT, having the constructor return
# a Union with NaTType makes things awkward for users of pandas
@property
def days(self) -> int: ...
@property
Expand Down
17 changes: 11 additions & 6 deletions pandas/_libs/tslibs/timestamps.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,7 @@ class Timestamp(datetime):

resolution: ClassVar[Timedelta]
value: int # np.int64

# error: "__new__" must return a class instance (got "Union[Timestamp, NaTType]")
def __new__( # type: ignore[misc]
def __new__(
cls: type[_DatetimeT],
ts_input: int
| np.integer
Expand All @@ -57,7 +55,10 @@ class Timestamp(datetime):
tzinfo: _tzinfo | None = ...,
*,
fold: int | None = ...,
) -> _DatetimeT | NaTType: ...
) -> _DatetimeT: ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment (maybe ref this issue) about the reasoning here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in next commit

# GH 46171
# While Timestamp can return pd.NaT, having the constructor return
# a Union with NaTType makes things awkward for users of pandas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be inclined to return Any in the public api if there are genuine concerns rather that have the incorrect return type.

def _set_freq(self, freq: BaseOffset | None) -> None: ...
@property
def year(self) -> int: ...
Expand Down Expand Up @@ -145,9 +146,11 @@ class Timestamp(datetime):
) -> _DatetimeT: ...
def __radd__(self: _DatetimeT, other: timedelta) -> _DatetimeT: ...
@overload # type: ignore
def __sub__(self, other: datetime) -> timedelta: ...
def __sub__(self, other: datetime) -> Timedelta: ...
@overload
def __sub__(self, other: timedelta | np.timedelta64 | Tick) -> datetime: ...
def __sub__(
self: _DatetimeT, other: timedelta | np.timedelta64 | Tick
) -> _DatetimeT: ...
def __hash__(self) -> int: ...
def weekday(self) -> int: ...
def isoweekday(self) -> int: ...
Expand Down Expand Up @@ -206,3 +209,5 @@ class Timestamp(datetime):
def to_numpy(
self, dtype: np.dtype | None = ..., copy: bool = ...
) -> np.datetime64: ...
@property
def _date_repr(self) -> str: ...
5 changes: 5 additions & 0 deletions pandas/_typing.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
import numpy.typing as npt

from pandas._libs import (
NaTType,
Period,
Timedelta,
Timestamp,
Expand Down Expand Up @@ -308,3 +309,7 @@ def closed(self) -> bool:
# Interval closed type

IntervalClosedType = Literal["left", "right", "both", "neither"]

# datetime and NaTType

DatetimeNaTType = Union[datetime, "NaTType"]
4 changes: 3 additions & 1 deletion pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -775,7 +775,9 @@ def _add_offset(self, offset) -> DatetimeArray:
def _sub_datetimelike_scalar(self, other):
# subtract a datetime from myself, yielding a ndarray[timedelta64[ns]]
assert isinstance(other, (datetime, np.datetime64))
assert other is not NaT
# error: Non-overlapping identity check (left operand type: "Union[datetime,
# datetime64]", right operand type: "NaTType") [comparison-overlap]
assert other is not NaT # type: ignore[comparison-overlap]
other = Timestamp(other)
# error: Non-overlapping identity check (left operand type: "Timestamp",
# right operand type: "NaTType")
Expand Down
7 changes: 2 additions & 5 deletions pandas/io/formats/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -1767,16 +1767,13 @@ def _format_datetime64_dateonly(
nat_rep: str = "NaT",
date_format: str | None = None,
) -> str:
if x is NaT:
if isinstance(x, NaTType):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 on code changes to satisfy the type checker. We know that static type checker have issues with singletons and imo that was adequately noted in the removed comment adjacent to the ignore statement.

It may be nbd is some cases, but am always -1 on principle.

return nat_rep

if date_format:
return x.strftime(date_format)
else:
# error: Item "NaTType" of "Union[NaTType, Any]" has no attribute "_date_repr"
# The underlying problem here is that mypy doesn't understand that NaT
# is a singleton, so that the check above excludes it here.
return x._date_repr # type: ignore[union-attr]
return x._date_repr


def get_format_datetime64(
Expand Down
3 changes: 2 additions & 1 deletion pandas/io/sas/sas_xport.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
import numpy as np

from pandas._typing import (
DatetimeNaTType,
FilePath,
ReadBuffer,
)
Expand Down Expand Up @@ -139,7 +140,7 @@
"""


def _parse_date(datestr: str) -> datetime:
def _parse_date(datestr: str) -> DatetimeNaTType:
"""Given a date in xport format, return Python date."""
try:
# e.g. "16FEB11:10:07:55"
Expand Down
8 changes: 6 additions & 2 deletions pandas/tests/resample/test_datetime_index.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
from datetime import datetime
from functools import partial
from io import StringIO
from typing import List

import numpy as np
import pytest
import pytz

from pandas._libs import lib
from pandas._typing import DatetimeNaTType
from pandas.errors import UnsupportedFunctionCall

import pandas as pd
Expand Down Expand Up @@ -1286,7 +1288,7 @@ def test_resample_consistency():
tm.assert_series_equal(s10_2, rl)


dates1 = [
dates1: List[DatetimeNaTType] = [
datetime(2014, 10, 1),
datetime(2014, 9, 3),
datetime(2014, 11, 5),
Expand All @@ -1295,7 +1297,9 @@ def test_resample_consistency():
datetime(2014, 7, 15),
]

dates2 = dates1[:2] + [pd.NaT] + dates1[2:4] + [pd.NaT] + dates1[4:]
dates2: List[DatetimeNaTType] = (
dates1[:2] + [pd.NaT] + dates1[2:4] + [pd.NaT] + dates1[4:]
)
dates3 = [pd.NaT] + dates1 + [pd.NaT] # type: ignore[operator]


Expand Down