Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented Decimal Transforms #17968

Open
wants to merge 30 commits into
base: branch-25.04
Choose a base branch
from

Conversation

lamarrr
Copy link
Contributor

@lamarrr lamarrr commented Feb 10, 2025

Description

This merge request implements transforms for decimal types.
It requires the UDF writer to manually decode the decimals from its representation and scale.
There's also an NVRTC/CCL issue with not recognizing __int128_t as an integral type in the device code that was fixed by defining __SIZEOF_INT128__.

Follows-up #18023
Closes #18229

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented Feb 10, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. Java Affects Java cuDF API. pylibcudf Issues specific to the pylibcudf package CMake CMake build issue labels Feb 10, 2025
@lamarrr lamarrr removed Python Affects Python cuDF API. Java Affects Java cuDF API. pylibcudf Issues specific to the pylibcudf package labels Feb 11, 2025
@lamarrr
Copy link
Contributor Author

lamarrr commented Feb 13, 2025

/ok to test

@lamarrr lamarrr added feature request New feature or request non-breaking Non-breaking change labels Feb 13, 2025
@lamarrr
Copy link
Contributor Author

lamarrr commented Feb 13, 2025

/ok to test

@lamarrr lamarrr marked this pull request as ready for review February 13, 2025 18:15
@lamarrr lamarrr requested review from a team as code owners February 13, 2025 18:15
@lamarrr lamarrr requested a review from davidwendt February 24, 2025 14:46
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions and questions attached.

{
std::vector<device_data_t> data;
return cudf::jit::column_device_view{
const_cast<void*>(view.head()), nullptr, view.null_mask(), view.type(), view.size()};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another consideration is that a cudf::column_view could be sliced which means it has a non-zero offset and represents only a partial view of the underlying data. This is handy in partitioning large columns with zero-copying.
So I think the jit::column_device_view would need to be able to handle this kind of input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added that now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are repeating the same code, I've created a follow-up issue to refactor the existing column_device_view for use in device code: #18229.

Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is amazing work but I don't want to skimp on using mutable to represent both mutable and non-mutable columns.

CUDF_KERNEL void kernel(cudf::size_type size, Out* __restrict__ out, In const* __restrict__... ins)
CUDF_KERNEL void kernel(cudf::size_type size,
cudf::jit::mutable_column_device_view const* output,
cudf::jit::mutable_column_device_view const* inputs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be using mutable_column_device_view for wrapping non mutable inputs.

Copy link
Contributor Author

@lamarrr lamarrr Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid duplicating functionality in this PR that will eventually be made redundant, I've gone ahead and implemented #18229 along with this PR.
So I've extracted column_device_view's core functionality into a basic_column_device_view that can now be used with JITify kernels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to have a brown bag session to discuss the potential solution with the team. Instead of introducing a new basic_column_device_view, I personally find that moving the host APIs declared in the existing column_device_view.cuh header to a new header provides a cleaner solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the static methods?
and the methods that contain host code? like thrust and rmm?
If we move those, we'd also have to update all the call-sites, making this PR larger.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the static methods?

These are the host functions I mentioned that could be moved to a separate C++ header. Instead of being static members, could we convert them into a factory function, like make_column_device_view? Ideally, CUDA headers should contain declarations for both host-device and device-only APIs. If it's a host-only API, a C++ header is likely the best place for it.

and the methods that contain host code? like thrust and rmm?

That’s the part I’m missing, which is why I’m asking for a brown bag session. What kind of host code is prohibited by JIT? Does it have to be device-only, or is host-device code also fine? I’m asking because cuda::std:: utilities are host-device compatible, and they seem to work with JIT. Thrust utilities used in this header, like pair and fancy iterators, are also host-device, so I’m not sure why they cannot be used in JIT code. As for the RMM usage, they all come from host-only APIs as I noticed, so I assume it shouldn’t be an issue after we move them into a separate C++ header.

Copy link
Contributor

@davidwendt davidwendt Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not really expect you go and refactor everything for this PR. I was more expecting just create a jit::column_device_view to match the jit::mutable_column_device_view and do the refactoring later in the follow-up PR.

Copy link
Contributor Author

@lamarrr lamarrr Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not really expect you go and refactor everything for this PR. I was more expecting just create a jit::column_device_view to match the jit::mutable_column_device_view and do the refactoring later in the follow-up PR.

Yes, I'm sorry about that, I wanted to keep the changeset minimal as well, but to have the jit::column_view and jit::mutable_column_view we'll end up writing a lot of duplicate code just to throw them away soon, which only became evident to me after adding jit::mutable_column_view. I was initially hesitant to refactor it in this PR to keep the changeset small, but I'm open to setting up a meeting to discuss the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s the part I’m missing, which is why I’m asking for a brown bag session. What kind of host code is prohibited by JIT? Does it have to be device-only, or is host-device code also fine? I’m asking because cuda::std:: utilities are host-device compatible, and they seem to work with JIT. Thrust utilities used in this header, like pair and fancy iterators, are also host-device, so I’m not sure why they cannot be used in JIT code.

Yes, it has to be device-only code and headers. This means we can't include system headers like thrust does, even if the contents aren't used or called. I described the problem in this issue: #17399.

As for the RMM usage, they all come from host-only APIs as I noticed, so I assume it shouldn’t be an issue after we move them into a separate C++ header.

If @davidwendt agrees that they should be moved in this PR, I'd prefer to move them outside the classes as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have questions about JIT compilation, but I’ll pause here to focus on making progress. I agree with David that the initial goal is to introduce a lightweight column device view specifically for JIT. Having duplicates is fine for now, as it helps narrow the scope for the final cleanup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Status: In Progress
Status: Burndown
Development

Successfully merging this pull request may close these issues.

[FEA] Refactor and strip cudf::column_device_view and cudf::mutable_column_device_view to device-only code
4 participants