Skip to content

[PT2E] observers do not handle inputs with different shapes correctly #2112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Xia-Weiwen opened this issue Apr 23, 2025 · 10 comments
Open
Assignees

Comments

@Xia-Weiwen
Copy link
Collaborator

Summary
The observers in Torchao, such as AffineQuantizedMinMaxObserver, support different quantization granularity by keeping block sizes. For example, if granularity is PerTensor and input shape is (16, 3, 224, 224), the observer keeps block sizes = (16, 3, 224, 224).
However, the block sizes would be wrong if inputs with different shapes are passed in. For example, if another input with shape = (16, 3, 56, 56) comes, the block sizes are updated as (16, 3, 56, 56). It is wrong for inputs with shape = (16, 3, 224, 224).

How to reproduce
The code to reproduce is similar as the reproducer here: #2094
One needs to bypass the #2094 issue by changing source code manually and also run the converted_model after convert_pt2e:

    converted_model = convert_pt2e(prepared_model)
    move_exported_model_to_eval(converted_model)
    print("[info] converted_model =\n", converted_model)
    converted_model(*example_inputs)  # add this line
@Xia-Weiwen Xia-Weiwen changed the title [PT2E] observers does not handle inputs with different shapes correctly [PT2E] observers do not handle inputs with different shapes correctly Apr 23, 2025
@jerryzh168
Copy link
Contributor

yeah we haven't test per tensor that much I think, we'd need to fix per tensor use case

@jerryzh168
Copy link
Contributor

I have thought about this scenario a bit before, I think the fix for this one would be using granularity instead of block_size in the observer, please feel free to take this if you have time @Xia-Weiwen

@Xia-Weiwen
Copy link
Collaborator Author

Do you mean using granularity in all cases or only for per-tensor?

@Xia-Weiwen
Copy link
Collaborator Author

I think using block size = -1 for per-tensor also makes sense. What do you think?

@jerryzh168
Copy link
Contributor

Do you mean using granularity in all cases or only for per-tensor?

just for per tensor for now, we can assert that only one of granularity and block_size can be valid, and use the one that's available

@jerryzh168
Copy link
Contributor

I think using block size = -1 for per-tensor also makes sense. What do you think?

yeah this could make sense, but then the question is can we express all dynamic shapes with block_size?

@Xia-Weiwen
Copy link
Collaborator Author

I think so. For a certain dimension:

  • If group size > 1 is specified, we just use that group size as block size
  • If group size == 1, such as per-channel and per token, we use block size = 1
  • If it's not group-wise quantization, such as per-tensor and the other dimension of per-channel, we use block size = -1

All cases can be covered. How does that sound?

@jerryzh168
Copy link
Contributor

I think so. For a certain dimension:

  • If group size > 1 is specified, we just use that group size as block size
  • If group size == 1, such as per-channel and per token, we use block size = 1
  • If it's not group-wise quantization, such as per-tensor and the other dimension of per-channel, we use block size = -1

All cases can be covered. How does that sound?

I remember discussed this with @drisspg and @vkuzo before. there is another dimension which is the number of dims (rank of the tensor), e.g. for per token/per row, it is actually: all the preceding dims except for last dim should use size 1, except for the last dim:

elif isinstance(granularity, PerRow):
, e.g. for 2-dim tensor, block_size will be (1, shape[-1]) for 3-dim tnesor block_size will be (1, 1, shape[-1])

I'm not sure if we will have a input tensor that has variable rank though

So in theory this is not enough, but maybe it's enough in practice?

@Xia-Weiwen
Copy link
Collaborator Author

I see. Yeah maybe an observer would not see multiple ranks in practice but I am not sure. Would you have more discussions and decide what to do? Thanks.

@jerryzh168
Copy link
Contributor

jerryzh168 commented Apr 24, 2025

yeah we'll discuss in our team meeting tomorrow to decide on what to do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants