Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add Categorical.to_array CategoricalAccessor.to_array and CategoricalIndex.to_array #50041

Open
1 of 3 tasks
topper-123 opened this issue Dec 3, 2022 · 0 comments
Open
1 of 3 tasks
Labels
Categorical Categorical Data Type Enhancement

Comments

@topper-123
Copy link
Contributor

topper-123 commented Dec 3, 2022

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

A Categorical can currently be converted to a numpy array by using the to_numpy method. This is fine if the underlying array is a numpy array, but if it is an ExtensionArray it is is not possible to get an ExtensionArray from a Categorical, except recreating the ExtensionArray manually.

For example for StringArray:

>>> import pandas as pd
>>> arr = pd.array(["a", pd.NA])
>>> cat = pd.Categorical(arr)
>>> cat.to_numpy()
array(['a', <NA>], dtype=object)  # does not maintain dtype
>>> cat.categories.array[cat.codes]  # manual method for getting the desired array type and dtype
<StringArray>
['a', 'a']
Length: 2, dtype: string

Feature Description

I propose adding a to_array method to Categorical, CategoricalAccessor & CategoricalIndex, which will return an array of the appropriate type (numpy array or ExtensionArray) of the same length as the Categorical. It probably should be possible to convert to and from Categoricals and ExtensionArrays/numpy.ndarrays losslessly and this should be tested for.

Alternative Solutions

The alternative would be to create the underlying array manually, as in the example above.

Additional Context

No response

@topper-123 topper-123 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 3, 2022
@mroeschke mroeschke added Categorical Categorical Data Type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Enhancement
Projects
None yet
Development

No branches or pull requests

2 participants