[PT2E] observers do not handle inputs with different shapes correctly #2112

Xia-Weiwen · 2025-04-23T03:03:26Z

Summary
The observers in Torchao, such as AffineQuantizedMinMaxObserver, support different quantization granularity by keeping block sizes. For example, if granularity is PerTensor and input shape is (16, 3, 224, 224), the observer keeps block sizes = (16, 3, 224, 224).
However, the block sizes would be wrong if inputs with different shapes are passed in. For example, if another input with shape = (16, 3, 56, 56) comes, the block sizes are updated as (16, 3, 56, 56). It is wrong for inputs with shape = (16, 3, 224, 224).

How to reproduce
The code to reproduce is similar as the reproducer here: #2094
One needs to bypass the #2094 issue by changing source code manually and also run the converted_model after convert_pt2e:

    converted_model = convert_pt2e(prepared_model)
    move_exported_model_to_eval(converted_model)
    print("[info] converted_model =\n", converted_model)
    converted_model(*example_inputs)  # add this line

The text was updated successfully, but these errors were encountered:

jerryzh168 · 2025-04-23T05:02:34Z

yeah we haven't test per tensor that much I think, we'd need to fix per tensor use case

jerryzh168 · 2025-04-23T05:10:30Z

I have thought about this scenario a bit before, I think the fix for this one would be using granularity instead of block_size in the observer, please feel free to take this if you have time @Xia-Weiwen

Xia-Weiwen · 2025-04-23T05:12:41Z

Do you mean using granularity in all cases or only for per-tensor?

Xia-Weiwen · 2025-04-24T00:14:59Z

I think using block size = -1 for per-tensor also makes sense. What do you think?

jerryzh168 · 2025-04-24T00:21:11Z

Do you mean using granularity in all cases or only for per-tensor?

just for per tensor for now, we can assert that only one of granularity and block_size can be valid, and use the one that's available

jerryzh168 · 2025-04-24T00:22:13Z

I think using block size = -1 for per-tensor also makes sense. What do you think?

yeah this could make sense, but then the question is can we express all dynamic shapes with block_size?

Xia-Weiwen · 2025-04-24T00:42:32Z

I think so. For a certain dimension:

If group size > 1 is specified, we just use that group size as block size
If group size == 1, such as per-channel and per token, we use block size = 1
If it's not group-wise quantization, such as per-tensor and the other dimension of per-channel, we use block size = -1

All cases can be covered. How does that sound?

jerryzh168 · 2025-04-24T00:53:38Z

I think so. For a certain dimension:

If group size > 1 is specified, we just use that group size as block size

If group size == 1, such as per-channel and per token, we use block size = 1

If it's not group-wise quantization, such as per-tensor and the other dimension of per-channel, we use block size = -1

All cases can be covered. How does that sound?

I remember discussed this with @drisspg and @vkuzo before. there is another dimension which is the number of dims (rank of the tensor), e.g. for per token/per row, it is actually: all the preceding dims except for last dim should use size 1, except for the last dim:

ao/torchao/quantization/observer.py

Line 81 in 2fcab01

elif isinstance(granularity, PerRow):

, e.g. for 2-dim tensor, block_size will be (1, shape[-1]) for 3-dim tnesor block_size will be (1, 1, shape[-1])

I'm not sure if we will have a input tensor that has variable rank though

So in theory this is not enough, but maybe it's enough in practice?

Xia-Weiwen · 2025-04-24T01:03:55Z

I see. Yeah maybe an observer would not see multiple ranks in practice but I am not sure. Would you have more discussions and decide what to do? Thanks.

jerryzh168 · 2025-04-24T02:10:45Z

yeah we'll discuss in our team meeting tomorrow to decide on what to do

Xia-Weiwen assigned jerryzh168 Apr 23, 2025

Xia-Weiwen changed the title ~~[PT2E] observers does not handle inputs with different shapes correctly~~ [PT2E] observers do not handle inputs with different shapes correctly Apr 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PT2E] observers do not handle inputs with different shapes correctly #2112

[PT2E] observers do not handle inputs with different shapes correctly #2112

Xia-Weiwen commented Apr 23, 2025

jerryzh168 commented Apr 23, 2025

jerryzh168 commented Apr 23, 2025

Xia-Weiwen commented Apr 23, 2025

Xia-Weiwen commented Apr 24, 2025

jerryzh168 commented Apr 24, 2025

jerryzh168 commented Apr 24, 2025

Xia-Weiwen commented Apr 24, 2025

jerryzh168 commented Apr 24, 2025

Xia-Weiwen commented Apr 24, 2025

jerryzh168 commented Apr 24, 2025 •

edited

Loading

[PT2E] observers do not handle inputs with different shapes correctly #2112

[PT2E] observers do not handle inputs with different shapes correctly #2112

Comments

Xia-Weiwen commented Apr 23, 2025

jerryzh168 commented Apr 23, 2025

jerryzh168 commented Apr 23, 2025

Xia-Weiwen commented Apr 23, 2025

Xia-Weiwen commented Apr 24, 2025

jerryzh168 commented Apr 24, 2025

jerryzh168 commented Apr 24, 2025

Xia-Weiwen commented Apr 24, 2025

jerryzh168 commented Apr 24, 2025

Xia-Weiwen commented Apr 24, 2025

jerryzh168 commented Apr 24, 2025 • edited Loading

jerryzh168 commented Apr 24, 2025 •

edited

Loading