Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Would like to expose Duplicate Keep Option in Distinct #18238

Open
warrickhe opened this issue Mar 11, 2025 · 1 comment · May be fixed by #18237
Open

[FEA] Would like to expose Duplicate Keep Option in Distinct #18238

warrickhe opened this issue Mar 11, 2025 · 1 comment · May be fixed by #18237
Labels
feature request New feature or request

Comments

@warrickhe
Copy link

Is your feature request related to a problem? Please describe.
I would like to be able to use the KEEP_FIRST option for distinct. At the moment, it is set to only KEEP_ANY, with no way to change it. It would be ideal to be able to change the keep_option. This is necessary for correctly implementing array_distinct in spark -rapids.

Describe the solution you'd like
I would like changes to expose this parameter in ColumnView.java and thus also exposing it in whatever necessary parameters along the way.

Describe alternatives you've considered
Considered duplicating the kernel in spark-rapids-jni, but this makes maintainability harder and it makes more sense to expose the parameter in cudf.

@warrickhe warrickhe added the feature request New feature or request label Mar 11, 2025
@warrickhe warrickhe linked a pull request Mar 11, 2025 that will close this issue
3 tasks
@bdice
Copy link
Contributor

bdice commented Mar 11, 2025

A PR would be welcome.

First, add this parameter to the detail:: API. Then add a copy of the current public API with a new parameter duplicate_keep_option before the null_equality and nan_equality. Then deprecate the existing public API, and make it call the new detail:: API with KEEP_ANY. The deprecation in the API docs should say @deprecated Deprecated in 25.04, to be removed in 25.06. Example of how to deprecate an API: #17221

Finally, add/update tests for the new option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants