-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lower-level kernel form? #578
Comments
This is essentially the evolution I have in mind with #562 Aligning KA relatively closely to OpenCL/SPIRV semantics My milestones for KA 0.10 is essentially
But without touching the kernel language itself. I would then expect users like you to start using the lower level interface directly. KA v1.0 would then be removing deprecated functionality from KA kernel language |
What will the low-level interface look like? Also, I’m concerned about maintaining the current performance levels as new features are added to KA. The 285% performance regression we experienced with the 0.9.34 semantics change was significant, and it would be great if we could avoid similar impacts in the future. |
Much more like programming OpenCL.
Full agreement on this. I have been trying to be very cautious with changes like that, but in this case it was unavoidable to correctly map kernels onto existing GPU architectures. |
Haven't looked into why yet, but #564 completely hangs my machine with GaussianSplatting.jl. |
Since you don't use global indices you should be able to add unsafe_indicies |
I saw also that the kernel now uses I guess I'm not sure when/why |
This is mainly to start a conversation around the KA kernel language, as it currently starts accumulating more functionality / cruft; for example, if I want a high-performance kernel as written in raw CUDA C++ (but backend- and type-agnostic and having all the Julia niceties), kernels would start to look like:
What I'd expect by default - a GPU kernel with comparable performance to CUDA - is not really what the language guides me to by default, as I need to add
@kernel unsafe_indices=true cpu=false inbounds=true
to get close. Even then, with the recent@synchronize
lane checks, we see big performance hits in previously well-performing code (e.g. from 540 ms to 1.54 s for a sum - see issue).Perhaps this is the point where I should emphasise how much I appreciate KernelAbstractions and the titanic work put into it and the JuliaGPU ecosystem. I hope this post does not come across as sweeping criticism, but a discussion for possible future improvements (of course, here "improvements" being simply my personal opinion based on the work I do - and how I'm using KA for HPC code).
Having followed KA development for a few years now, I understand the constraints that evolved the current KA interface - implicit boundschecks, separate CPU and GPU compilation pipelines,
ndrange
being, well, a range and not theblocksize
andnblocks
seen in CUDA, divergentsynchronize
, etc.Would there be a possibility for, say, a
@rawkernel
, with more minimal functionality:Or more JuliaGPU-like kernel syntax:
Which would very closely map to the GPU backend's kernel language; I think this would have a few advantages:
@cuda
,@metal
, etc. kernels.What are your thoughts?
The text was updated successfully, but these errors were encountered: