Add torchao mps ops #1415

manuelcandales · 2024-12-10T19:19:28Z

This PR adds the quantization scheme linear:afpwx. It quantizes only the weights in a groupwise manner with a specified bitwidth and groupsize. It takes arguments bitwidth (1, 2, 3, 4, 5, 6, 7) and groupsize (32, 64, 128, 256).

To use linear:afpwx, you must first set up the torchao mps experimental kernels. These will only work on a device with Apple Silicon.

From the torchchat root directory, run

bash torchchat/utils/scripts/build_torchao_ops.sh mps

Notice that this quantization scheme is currently implemented only for device mps.

python3 torchchat.py generate stories110M --device mps --dtype float32 --quantize '{"linear:afpwx": {"bitwidth": 4, "groupsize": 256}}' --prompt "Once upon a time,"

pytorch-bot · 2024-12-10T19:19:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1415

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 1 Pending, 2 Unrelated Failures

As of commit c164f88 with merge base 4dc2f89 ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-cpu-aoti (aarch64, stories15M) (gh) (trunk failure)
Process completed with exit code 1.
pull / test-cpu-aoti (x86_64, stories15M) (gh) (trunk failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

metascroy · 2024-12-13T17:40:52Z

docs/quantization.md

+### Use
+
+#### linear:fpaxw
+The quantization scheme linear:fpaxw quantizes only the weights in a groupwise manner with a specified bitwidth and groupsize.


Should we keep the naming convention that torchchat uses (of "a" followed by type and "w" followed by type). This started with a8w4dq before I added any kernels. In your case this would be something like afpwx?

Ok, this makes sense.

metascroy · 2024-12-13T17:41:46Z

docs/quantization.md

+
+#### Eager mode
+```
+python3 torchchat.py generate stories110M --device mps --dtype float32 --quantize '{"linear:fpaxw": {"bitwidth": 4, "groupsize": 256}}' --prompt "Once upon a time," --num-samples 5


Do these only work with eager? If so, explicitly say that in the set-up section?

The metal lowbit kernels run with ExecuTorch as well (llama runner can use them). However, my aim in this torchchat PR was only to enable eager. I plan to have a follow up PR to enable them via the torchchat ET path as well. But I prefer to keep it modular.

I added a sentence in the setup section, clarifying that currently torchchat can only use them on Eager mode

metascroy · 2024-12-13T17:45:08Z

torchchat/utils/scripts/install_utils.sh

Can you add a CI test for the MPS kernels to make sure they install and run?

See https://github.com/pytorch/torchchat/blob/main/.github/workflows/pull.yml#L1060 as an example.

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 10, 2024

manuelcandales force-pushed the torchao-mps branch from e0c7ced to 0f1825c Compare December 10, 2024 19:30

manuelcandales requested review from metascroy and kimishpatel December 10, 2024 19:32

mikekgfb mentioned this pull request Dec 11, 2024

MacOS test falsely uses MPS, fails and is misreported as passing #1416

Open

manuelcandales force-pushed the torchao-mps branch from 0f1825c to 1cf4a7f Compare December 13, 2024 04:39

metascroy reviewed Dec 13, 2024

View reviewed changes

manuelcandales force-pushed the torchao-mps branch 3 times, most recently from 270044b to 75804d8 Compare December 13, 2024 18:56

metascroy approved these changes Dec 13, 2024

View reviewed changes

Add torchao mps ops

c164f88

manuelcandales force-pushed the torchao-mps branch from 75804d8 to c164f88 Compare December 13, 2024 19:14

manuelcandales merged commit 570aebc into main Dec 13, 2024
51 of 53 checks passed

vmpuri pushed a commit that referenced this pull request Feb 4, 2025

Add torchao mps ops (#1415)

36d0712

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add torchao mps ops #1415

Add torchao mps ops #1415

manuelcandales commented Dec 10, 2024 •

edited

Loading

pytorch-bot bot commented Dec 10, 2024 •

edited

Loading

metascroy Dec 13, 2024

manuelcandales Dec 13, 2024

metascroy Dec 13, 2024 •

edited

Loading

manuelcandales Dec 13, 2024

manuelcandales Dec 13, 2024

metascroy Dec 13, 2024

manuelcandales Dec 13, 2024

Add torchao mps ops #1415

Add torchao mps ops #1415

Conversation

manuelcandales commented Dec 10, 2024 • edited Loading

pytorch-bot bot commented Dec 10, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1415

⏳ 1 Pending, 2 Unrelated Failures

metascroy Dec 13, 2024

Choose a reason for hiding this comment

manuelcandales Dec 13, 2024

Choose a reason for hiding this comment

metascroy Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

manuelcandales Dec 13, 2024

Choose a reason for hiding this comment

manuelcandales Dec 13, 2024

Choose a reason for hiding this comment

metascroy Dec 13, 2024

Choose a reason for hiding this comment

manuelcandales Dec 13, 2024

Choose a reason for hiding this comment

manuelcandales commented Dec 10, 2024 •

edited

Loading

pytorch-bot bot commented Dec 10, 2024 •

edited

Loading

metascroy Dec 13, 2024 •

edited

Loading