Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose experimental LLVM features for GPU offloading #109

Open
2 of 4 tasks
nikomatsakis opened this issue Jul 22, 2024 · 10 comments
Open
2 of 4 tasks

Expose experimental LLVM features for GPU offloading #109

nikomatsakis opened this issue Jul 22, 2024 · 10 comments

Comments

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jul 22, 2024

Metadata
Point of contact @ZuseZ4
Team(s) compiler, lang
Goal document 2025h1/GPU-Offload

Summary

Expose experimental LLVM features for GPU offloading and allow combining it with the std::autodiff feature.

Tasks and status

@nikomatsakis nikomatsakis added this to the 2024h2 milestone Jul 22, 2024
@rust-lang rust-lang locked and limited conversation to collaborators Jul 25, 2024
@nikomatsakis
Copy link
Contributor Author

This issue is intended for status updates only.

For general questions or comments, please contact the owner(s) directly.

@ZuseZ4
Copy link
Member

ZuseZ4 commented Aug 24, 2024

During the first month, I focused on automatic differentiation. I cleaned up my rustc fork and made my first two upstreaming PRs for the frontend and backend. Once they are merged, I will continue with posting PRs for the remaining middle-end. While waiting for reviews, I have been improving the docs a bit, mainly the pages about debugging Enzyme crashes. I am especially proud that due to those docs we recently got our first enzyme core issue with a full LLVM-IR reproducer from a Rust dev, even though the developer reporting that issue had no previous compiler/LLVM experience. Such detailed issues make fixing bugs for Enzyme core much easier.

On the GPU side, I mainly have to thank nikic, who reliably updates the LLVM backend of Rustc every few weeks or months. Thanks to his latest update rustc now supports a sufficiently new LLVM which ships most of the GPU/Offloading work that I want to expose on the Rust side. Once my first two Autodiff patches have settled, I'll look a bit more into setting up documentation for the GPU feature.

@ZuseZ4
Copy link
Member

ZuseZ4 commented Sep 12, 2024

During the last three weeks, my first autodiff PR for the backend, which includes the enzyme submodule and 13 additional files, got merged! I also got a ton of feedback from reviewers, especially for my frontend PR (thanks to jieyouxu). Now that the backend is merged, I did put up my third PR, covering the changes I made to rustc_codegen_llvm. I am currently at RustConf, so I won't be able to address much of the feedback this week, but I am happy to talk to everyone also visiting and will try to get both PRs ready to merge in the next week.
Once we then have the two open PRs merged, we should have my changes to ~55/85 files upstream, so we're making good progress.

On the GPU side again not many updates due to my current autodiff focus, but thanks to another llvm submodule update we can now use some nicer APIs for our development in rustc, which recently got merged into LLVM.

@ZuseZ4
Copy link
Member

ZuseZ4 commented Sep 16, 2024

And as another short update, my talk "When unsafe code is slow - Automatic Differentiation in Rust" got accepted as a techtalk for the LLVM dev mtg. There I'll present a lot of benchmarks and some analysis comparing Rust-Enzyme with the C++ frontend of Enzyme, and show one application which we had to port from Python/JAX to Rust/Enzyme.
The full program of the dev meeting is available here.
For that, I spent some time trying to fix the benchmark infrastructure in Enzyme core, to make sure everyone can reproduce our benchmarks.

@ZuseZ4
Copy link
Member

ZuseZ4 commented Sep 30, 2024

Thanks to some support from the bootstrap team, dist builds with autodiff support enabled now work.
That allowed us to add Rust to our autodiff fork of the compiler explorer: https://enzyme.mit.edu/explorer/
Unfortunately, we still have some dist issues about finding std in the compiler explorer build, so help here would be appreciated.
Other than that, this morning my PR to add Enzyme/autodiff support to the test infra got merged: rust-lang/rust#131044
This should allow to add this larger frontend PR to the merge queue later today: rust-lang/rust#129458

@ZuseZ4
Copy link
Member

ZuseZ4 commented Oct 22, 2024

I've been travelling a lot for the last two weeks, but hope to be able to get back to work next Monday. Since the last update we got:

  1. The Autodiff frontend got merged! This included over 2k LoC and 30 files, so the remaining diff is now much smaller.
  2. The Autodiff middle-end as the last missing AD piece is probably getting a re-design. Right now we use Enzyme as a library, which means that we must write ffi wrappers around Enzyme's C/C++ functions and have to differentiate functions one by one. If we switch over to an LLVM pass-based approach instead we can drop a lot of glue code (simplifying the review process) and can get us some features for free, which the pass already handles for us (i.e. differentiate higher-order derivatives in the right order). Julia also just moved over from the library to the pass-based approach. C/C++ always used the pass-based approach which in the past had a few limitations that recently got fixed. Finally, a pass-based approach has reproducibility improvements, since now all information will be in the llvm-ir. As summary, this seems like a good moment to also move Rust over.
  3. I opened a tracking issue for the GPU offload feature and made the first PR to enable LLVM's offload feature.
  4. I started working with some Enzyme and Bootstrap contributors to get a compiler explorer instance with Rust-AD to work.
  5. I am giving one tech talk and two workshop talks at the LLVM Dev Conference, I will share the slides (and videos if possible) afterwards. The three talks are about ML in Rust, GPU-Programming in Rust, and the performance benefits of safe over unsafe code.

Help Wanted:
I would appreciate if someone could look into fixing our Rust dist build used in the Enzyme Compiler explorer. I have spend quite a few hours trying different configurations, but have been unable to get rid of the error

error[E0463]: can't find crate for `std`

Any help would be appreciated, I can share more information if someone has time to investigate further.

@ZuseZ4
Copy link
Member

ZuseZ4 commented Nov 27, 2024

  1. The re-design of our autodiff middle/backend which I described in the last update was implemented. This led to a reduction from 2.5k to 1.1k LoC remaining to be upstreamed. I split the code up in two PRs (Autodiff Upstreaming - rustc_codegen_ssa, rustc_middle rust#133429 and Autodiff Upstreaming - rustc_codegen_llvm changes rust#130060). Both are now small enough to be reviewed, and got their first round of feedback, so they will hopefully land at the beginning of December. Afterwards, everything needed to run autodiff will be available on nightly (at least as MVP), so we can discuss to build and ship it by default.

  2. The talks which I gave at LLVM Dev caused some interesting follow-up discussions. Most companies still use Rust "only" for classical SWE, but given that it's getting more and more common I also see more interest outside of Academia in using it for (scientific) computing, ML, HPC, etc, which I find exciting. I also got some offers from people in industry to help with the GPU work.

  3. The preprint of the first paper making use of std::autodiff is available on Arxive! https://arxiv.org/abs/2411.17011v1
    The code is also available here: https://github.com/ChemAI-Lab/molpipx/. It includes both Python/JAX and Rust implementations, because JAX Jitting times are unbearably slow here. In certain configuration it takes more than a day to JIT, but only 30 minutes to compile in Rust.

  4. Once autodiff is upstreamed (especially including some small follow-up PRs which are needed to achieve the best performance), I will also publish some very promising runtime results that we have on a larger set of benchmarks.

  5. Last month I asked for help with our compiler explorer, and I'm happy that we indeed got the needed support since, thank you! fix rustc installation EnzymeAD/enzyme-explorer#15 Our compiler explorer for Rust with std::autodiff support is now available under https://enzyme.mit.edu/explorer/ (just select Rust).

@ZuseZ4
Copy link
Member

ZuseZ4 commented Jan 3, 2025

Happy New Year everyone! After a few more rounds of feedback, the next autodiff PR recently got merged: rust-lang/rust#130060
With that, I only have one last PR open to have a fully working autodiff MVP upstream. A few features had to be removed during upstreaming to simplify the reviewing process, but they should be easier to bring back as single PRs.

Beginning next week, I will also work on an MVP for the batching feature of LLVM/Enzyme, which enables some AoS and SoA vectorization. It mostly re-uses the existing autodiff infrastructure, so I expect the PRs for it to be much smaller.

On the GPU side, there has been a recent push by another developer to add a new AMD GPU target to the Rust compiler. This is something that I would have needed for the llvm offload project anyway, so I'm very happy to see movement here: rust-lang/compiler-team#823

@nikomatsakis nikomatsakis modified the milestones: 2024h2, 2025h1 Feb 18, 2025
@nikomatsakis
Copy link
Contributor Author

This is a continuing project goal, and the updates below this comment will be for the new period 2025h1

@nikomatsakis nikomatsakis moved this to Project goals in Lang team features Feb 21, 2025
@nikomatsakis nikomatsakis changed the title Expose experimental LLVM features for automatic differentiation and GPU offloading Expose experimental LLVM features for GPU offloading Feb 26, 2025
@nikomatsakis nikomatsakis moved this to Project goal in Lang team features Mar 4, 2025
@ZuseZ4
Copy link
Member

ZuseZ4 commented Mar 25, 2025

I just noticed that I missed my February update, so I'll keep this update a bit more high-level, to not make it too long.

Key developments:

  1. All key autodiff PRs got merged. So after building rust-lang/rust with the autodiff feature enabled, users can now use it, without the need for any custom fork.
  2. std::autodiff received the first PRs from new contributors, which have not been previously involved in rustc development! My plan is to grow a team to maintain this feature, so that's a great start. The PRs are here, here and here. Over time I hope to hand over increasingly larger issues.
  3. I received an offer to join the Rust compiler team, so now I can also officially review and approve PRs! For now I'll focus on reviewing PRs in the fields I'm most comfortable with, so autodiff, batching, and soon GPU offload.
  4. I implemented a standalone batching feature. It was a bit larger (~2k LoC) and needed some (back then unmerged) autodiff PRs, since they both use the same underlying Enzyme infrastructure. I therefore did not push for merging it.
  5. I recently implemented batching as part of the autodiff macro, for people who want to use both together. I subsequently split out a first set of code improvements and refactorings, which already got merged. The remaining autodiff feature PR is only 600 loc, so I'm currently cleaning it up for review.
  6. I spend time preparing an MCP to enable autodiff in CI (and therefore nightly). I also spend a lot of time discussing a potential MLIR backend for rustc. Please reach out if you want to be involved!

**Help wanted: **
We want to support autodiff in lib builds, instead of only binaries. oli-obk and I recently figured out the underlying bug, and I started with a PR in rust-lang/rust#137570. The problem is that autodiff assumes fat-lto builds, but lib builds compile some of the library code using thin-lto, even if users specify lto=fat in their Cargo.toml. We'd want to move every thing to fat-lto if we enable Autodiff as a temporary solution, and later move towards embed-bc as a longer-term solution. If you have some time to help please reach out! Some of us have already looked into it a little but got side-tracked, so it's better to talk first about which code to re-use, rather than starting from scratch.

I also booked my RustWeek ticket, so I'm happy to talk about all types of Scientific Computing, HPC, ML, or cursed Rust(c) and LLVM internals! Please feel free to dm me if you're also going and want to meet.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Status: Project goal
Development

No branches or pull requests

3 participants