-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented Decimal Transforms #17968
base: branch-25.04
Are you sure you want to change the base?
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
4ee86e3
to
43fef79
Compare
43fef79
to
c3ee2eb
Compare
/ok to test |
/ok to test |
…into transform-decimals
…into transform-decimals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few suggestions and questions attached.
cpp/src/transform/transform.cpp
Outdated
{ | ||
std::vector<device_data_t> data; | ||
return cudf::jit::column_device_view{ | ||
const_cast<void*>(view.head()), nullptr, view.null_mask(), view.type(), view.size()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another consideration is that a cudf::column_view
could be sliced which means it has a non-zero offset
and represents only a partial view of the underlying data. This is handy in partitioning large columns with zero-copying.
So I think the jit::column_device_view would need to be able to handle this kind of input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added that now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we are repeating the same code, I've created a follow-up issue to refactor the existing column_device_view
for use in device code: #18229.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is amazing work but I don't want to skimp on using mutable
to represent both mutable and non-mutable columns.
cpp/src/transform/jit/kernel.cu
Outdated
CUDF_KERNEL void kernel(cudf::size_type size, Out* __restrict__ out, In const* __restrict__... ins) | ||
CUDF_KERNEL void kernel(cudf::size_type size, | ||
cudf::jit::mutable_column_device_view const* output, | ||
cudf::jit::mutable_column_device_view const* inputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be using mutable_column_device_view
for wrapping non mutable inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid duplicating functionality in this PR that will eventually be made redundant, I've gone ahead and implemented #18229 along with this PR.
So I've extracted column_device_view's core functionality into a basic_column_device_view
that can now be used with JITify kernels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to have a brown bag session to discuss the potential solution with the team. Instead of introducing a new basic_column_device_view
, I personally find that moving the host APIs declared in the existing column_device_view.cuh
header to a new header provides a cleaner solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the static methods?
and the methods that contain host code? like thrust and rmm?
If we move those, we'd also have to update all the call-sites, making this PR larger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the static methods?
These are the host functions I mentioned that could be moved to a separate C++ header. Instead of being static members, could we convert them into a factory function, like make_column_device_view
? Ideally, CUDA headers should contain declarations for both host-device and device-only APIs. If it's a host-only API, a C++ header is likely the best place for it.
and the methods that contain host code? like thrust and rmm?
That’s the part I’m missing, which is why I’m asking for a brown bag session. What kind of host code is prohibited by JIT? Does it have to be device-only, or is host-device code also fine? I’m asking because cuda::std::
utilities are host-device compatible, and they seem to work with JIT. Thrust utilities used in this header, like pair
and fancy iterators, are also host-device, so I’m not sure why they cannot be used in JIT code. As for the RMM usage, they all come from host-only APIs as I noticed, so I assume it shouldn’t be an issue after we move them into a separate C++ header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not really expect you go and refactor everything for this PR. I was more expecting just create a jit::column_device_view
to match the jit::mutable_column_device_view
and do the refactoring later in the follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not really expect you go and refactor everything for this PR. I was more expecting just create a
jit::column_device_view
to match thejit::mutable_column_device_view
and do the refactoring later in the follow-up PR.
Yes, I'm sorry about that, I wanted to keep the changeset minimal as well, but to have the jit::column_view
and jit::mutable_column_view
we'll end up writing a lot of duplicate code just to throw them away soon, which only became evident to me after adding jit::mutable_column_view
. I was initially hesitant to refactor it in this PR to keep the changeset small, but I'm open to setting up a meeting to discuss the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s the part I’m missing, which is why I’m asking for a brown bag session. What kind of host code is prohibited by JIT? Does it have to be device-only, or is host-device code also fine? I’m asking because cuda::std:: utilities are host-device compatible, and they seem to work with JIT. Thrust utilities used in this header, like pair and fancy iterators, are also host-device, so I’m not sure why they cannot be used in JIT code.
Yes, it has to be device-only code and headers. This means we can't include system headers like thrust does, even if the contents aren't used or called. I described the problem in this issue: #17399.
As for the RMM usage, they all come from host-only APIs as I noticed, so I assume it shouldn’t be an issue after we move them into a separate C++ header.
If @davidwendt agrees that they should be moved in this PR, I'd prefer to move them outside the classes as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have questions about JIT compilation, but I’ll pause here to focus on making progress. I agree with David that the initial goal is to introduce a lightweight column device view specifically for JIT. Having duplicates is fine for now, as it helps narrow the scope for the final cleanup.
…into transform-decimals
Description
This merge request implements transforms for decimal types.
It requires the UDF writer to manually decode the decimals from its representation and scale.
There's also an NVRTC/CCL issue with not recognizing
__int128_t
as an integral type in the device code that was fixed by defining__SIZEOF_INT128__
.Follows-up #18023
Closes #18229
Checklist