Arm_inductor_quantizer for Pt2e quantization #2139
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Title: Enable PyTorch 2 Export Quantization path for ARM CPUs.
Description:
Key Changes:
Introduces ARM-specific support by leveraging oneDNN kernels for matmuls and convolution.
Integrates pre-defined configuration selection to automatically choose the best quantization settings based on the selected quantization method.
Provides customization options via two flags:
These options allow users to tailor the quantization process for their specific workload requirements (e.g., using QAT for fine-tuning or PTQ for calibration-based quantization).
Testing and Validation:
The new ARM flow has been thoroughly tested across a range of models with all combinations:
NLP: Models such as BERT and T5.
Vision: Models like ResNet and ViT.
Custom Models: user defined models with various operators.
example script:
cc: @jerryzh168, @fadara01, @Xia-Weiwen