Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A first sample version of FloatQuant #159

Open
wants to merge 5 commits into
base: feature/float_quant
Choose a base branch
from

Conversation

nghielme
Copy link

@nghielme nghielme commented Dec 9, 2024

Please, consider that the function should be tested more extensively.
Sample FloatQuant function implemented. A sample use of the function can be found in the Examples. ±inf are clipped to ±max_val. ±NaN are mapped to NaN. The zero is always representable. I tested with subnormals (to be intended as subnormals for the output representation) and the quantizer represented the subnormals with no loss (I didn't extensively tested this part though). I tested the function against Brevitas FloatQuant implementation: they do not always match. For example I think 0.3125 should be representable (x == xq) by a float quantizer with 4bits for mantissa, 4bits for the exponent, 0 bias and 1bit for the sign. Brevitas FloatQuant implementation quantize it to 0.25. Not sure what I should consider correct for this case.

…on can be found in the `Examples`. `±inf` are clipped to `±max_val`. `±NaN` are mapped to `±NaN`. The zero is always representable. I tested with subnormals (to be intended as subnormals for the output representation) and the quantizer represented the subnormals with no loss (I didn't extensively tested this part though). I tested the function against Brevitas `FloatQuant` implementation: they do not always match. For example I think `0.3125` should be representable (`x == xq`) by a float quantizer with 4bits for mantissa, 4bits for the exponent, 0 bias and 1bit for the sign. Brevitas `FloatQuant` implementation quantize it to `0.25`. Not sure what I should consider correct for this case.
@nghielme nghielme requested a review from maltanar December 9, 2024 20:51
@nickfraser
Copy link

Brevitas developer here, thanks for this concrete example - I will look into it ASAP!

@nghielme
Copy link
Author

Hi @nickfraser
I spotted some potential bugs in Brevitas FloatQuant, should I create one issue in the corresponding GH repo in which I can describe them and propose my fix?

@nickfraser
Copy link

Yes, please do - if you can provide minimal examples as well, this will make it much easier for us as well 🙏. Note, dev is our staging branch, so please make sure any issues still exist in dev.

If you have proposed solutions, feel free to make PRs as well (pointing to dev), so long as you follow our contributor guidelines.

exponent_bias=None,
max_val=None,
rounding_mode="ROUND",
lt_subnorm_to_zero=False,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maltanar , this name is terrible! Please, help me in finding a better one 🤦‍♂️

… quantization logic. Now QONNX and Brevitas float quantisers match.
Copy link
Collaborator

@maltanar maltanar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nghielme ! Looking good, but needs a few fixes before merging.



#### Sample Implementation
TODO
```python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a comment that links this back to the source file, in case it changes in the future but we forget to update it here.
e.g.
# see src/qonnx/custom_op/general/floatquant.py for up-to-date implementation


import numpy as np

from qonnx.custom_op.general.floatquant import compute_max_val, float_quantize
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong name imported? the impl seems to have been renamed to float_quant (and no longer float_quantize)

Comment on lines +55 to +56
exponent_bias,
signed,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we have a notion of a default exponent bias (as implemented by the compute_default_exponent_bias function) I suggest to enable using the default by setting exponent_bias=None. in that case signed should be either moved up in the parameter list to come before the params with default values, or assigned some default value (True?) itself

Comment on lines +52 to +57
assert np.all(float_quantize(testcase_a, unit_scale, 2, 3) == testcase_a)
assert np.all(float_quantize(testcase_b, unit_scale, 2, 3) == testcase_b)
assert np.all(float_quantize(testcase_c, unit_scale, 2, 3) == compute_max_val(2, 3))
assert np.all(float_quantize(testcase_d, unit_scale, 3, 2) == compute_max_val(3, 2))
assert np.all(float_quantize(testcase_e, unit_scale, 2, 1) == compute_max_val(2, 1))
assert np.all(float_quantize(testcase_f, unit_scale, 2, 3, lt_subnorm_to_zero=True) == 0.0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing signed and perhaps exponent_bias args?

assert compute_max_val(2, 3) == 7.5 # FP6 E2M3
assert compute_max_val(3, 2) == 28.0 # FP6 E3M2
assert compute_max_val(2, 1) == 6.0 # FP4 E2M1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also add the test with the example executing QONNX exported from Brevitas & comparing against (pre-generated) reference value from Brevitas, which you shared with me previously

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants