Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cuDF-Polars string-column serialization is broken #18228

Open
rjzamora opened this issue Mar 11, 2025 · 0 comments · May be fixed by #18232
Open

[BUG] cuDF-Polars string-column serialization is broken #18228

rjzamora opened this issue Mar 11, 2025 · 0 comments · May be fixed by #18232
Labels
bug Something isn't working cudf.polars Issues specific to cudf.polars

Comments

@rjzamora
Copy link
Member

Describe the bug
#18146 recently skipped a large number of cudf-polars tests that were being caused by this bug.

As far as I can tell, there is something going wrong when we convert a pl.DataFrame object to a cudf_polars DataFrame object when there are string columns present. For example, I am able to serialize string columns that originate from a pyarrow table, but I cannot serialize the same data if it originates from a pl.DataFrame.

Steps/Code to reproduce bug

This works...

import pylibcudf as plc
import pyarrow as pa

arrow_tbl = pa.table({"a": ["a", "bb", "ccc"]})
t = plc.interop.from_arrow(arrow_tbl)

packed = plc.contiguous_split.pack(t).release()
table = plc.contiguous_split.unpack_from_memoryviews(*packed)

This results in errors...

import polars as pl
import pylibcudf as plc
import pyarrow as pa
from cudf_polars.containers import DataFrame

pdf = pl.DataFrame({"a": ["a", "bb", "ccc"]})
t = plc.interop.from_arrow(pdf.to_arrow())

packed = plc.contiguous_split.pack(t).release()
table = plc.contiguous_split.unpack_from_memoryviews(*packed)

Additional context
This is a blocker for multi-GPU polars work.

@rjzamora rjzamora added bug Something isn't working cudf.polars Issues specific to cudf.polars labels Mar 11, 2025
@wence- wence- linked a pull request Mar 11, 2025 that will close this issue
3 tasks
rapids-bot bot pushed a commit that referenced this issue Mar 12, 2025
This fixes a few experimental cudf-polars tests (due to #18228), and reduces the scope of distributed tests to the `tests/experimental` directory.

Authors:
  - Richard (Rick) Zamora (https://github.com/rjzamora)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)
  - Matthew Murray (https://github.com/Matt711)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #18244
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cudf.polars Issues specific to cudf.polars
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

1 participant