You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently, there isn't a straightforward, stable API within cuDF for converting CuPy arrays (both 1D and 2D) into list columns. This functionality is repeatedly implemented externally (such as in NeMo Curator/Crossfit), causing recurrent breakages across different cuDF releases. For instance, previously addressed fixes include:
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/crossfit/backend/cudf/series.py", line 93, in create_list_series_from_1d_or_2d_ar
offset_col = as_column(cp.arange(start=0, stop=len(data) + 1, step=n_cols), dtype="int32")
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/column/column.py", line 2483, in as_column
col = col.astype(dtype)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/column/column.py", line 1624, in astype
elif dtype.kind == "M":
Proposed Solution
Introduce a robust, well-maintained cuDF API to seamlessly convert CuPy arrays into cuDF list columns (both directions: CuPy array ↔ cuDF list). Providing this API will:
Reduce maintenance overhead from recurring breakages.
Support essential applications relying on this feature like NeMo Curator, GNNs and some potential RAG applications.
Additional Context
Supporting direct CuPy array ↔ cuDF list interoperability is crucial for various workflows leveraging RAPIDS + array/DL type workflows.
Is your feature request related to a problem? Please describe.
Currently, there isn't a straightforward, stable API within cuDF for converting CuPy arrays (both 1D and 2D) into list columns. This functionality is repeatedly implemented externally (such as in NeMo Curator/Crossfit), causing recurrent breakages across different cuDF releases. For instance, previously addressed fixes include:
Crossfit Issue #84
Crossfit PR #86 (cuDF 24.10)
Crossfit PR #105 (cuDF 25.02)
The current external implementation (shown below) has again broken in the latest cuDF version:
https://github.com/rapidsai/crossfit/blob/745208dc50d717dba5c35f6b75cc41a4678576bb/crossfit/backend/cudf/series.py#L57-L103
This results in the following error:
Proposed Solution
Introduce a robust, well-maintained cuDF API to seamlessly convert CuPy arrays into cuDF list columns (both directions: CuPy array ↔ cuDF list). Providing this API will:
Reduce maintenance overhead from recurring breakages.
Support essential applications relying on this feature like NeMo Curator, GNNs and some potential RAG applications.
Additional Context
Supporting direct CuPy array ↔ cuDF list interoperability is crucial for various workflows leveraging RAPIDS + array/DL type workflows.
CC: @praateekmahajan, @sarahyurick
The text was updated successfully, but these errors were encountered: