Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be more strict about converting float to double #458

Merged
merged 5 commits into from
Mar 28, 2023

Conversation

sw
Copy link
Contributor

@sw sw commented Mar 24, 2023

This enables -Wdouble-promotion and syncs the Makefile and CMakeLists.txt with regards to warnings.

Reasoning:
The llama.cpp codebase depends on the correct use of number types, whether those are float, double or some of the spezialized types such as q4_0. Converting from one type to another should be a concious decision and not happen by accident.

In order to avoid any type promotion warnings, I have updated the code base, sometimes by making an implicit cast explicit, but in some places I did change the actual semantics. I'm not confident at this point that all changes are good.

Consequences:

  • Inference output changes, though I wouldn't say for the worse. Perplexity has an ETA of 20 hours on my machine, so I haven't run that yet.
  • q4_0 quantization is identical.

Further steps if and when this PR is merged:

  • Enable -Werror?

@sw
Copy link
Contributor Author

sw commented Mar 24, 2023

clang on macOS is apparently stricter, I'll clean this up using the warnings from the CI run.

I'm not sure if the double precision is needed in ggml_compute_forward_rope_f32/_f16.

@Green-Sky
Copy link
Collaborator

i assume inference speed changes will be minimal, and only really a thing with simd disabled?

@sw
Copy link
Contributor Author

sw commented Mar 24, 2023

i assume inference speed changes will be minimal, and only really a thing with simd disabled?

I believe master got a bit slower recently, but I can't detect a regression with this PR and AVX2 optimizations enabled.

@anzz1
Copy link
Contributor

anzz1 commented Mar 25, 2023

I somewhat agree with:

  • Typecastings in most cases should be explicit to denote conscious choices, however I understand that there can be a preference of not using them when you understand what you are doing (explicit can be easier for others to read and understand, but not always as in longer formulas it also can make it harder to read, and it always requires more typing)
  • 1.0D and 1.0f is better than 1.0

However, I am not qualified to comment on the math itself. I can only say that changes like these require extra scrutiny, since having correct rounding is an integral part for the models to work properly. round != roundf , sqrt != sqrtf, etc. It is important to acknowledge the difference between number types, rounding and truncation including their effects on the results and speed of the calculations at the bare metal level.


One example:

3afbe43#diff-6d9ce99fcb6f51ff76f59e479f6e6fc0bb62edef7442805d7a5bb15b23996b5dL482

To me it seems that calculations like this are very deliberate

const float v0 = x[i*QK + l + 0]*id;
const float v1 = x[i*QK + l + 1]*id;
const uint8_t vi0 = ((int8_t) (round(v0))) + 8;
const uint8_t vi1 = ((int8_t) (round(v1))) + 8;

and to correctly explicitly cast it would be

const uint8_t vi0 = (uint8_t)(((int8_t)(round((double)v0))) + 8);

this already shows why explicit typecastings can be more trouble than worth in many cases

however the way you changed it to

const uint8_t vi0 = (int8_t)roundf(v0) + 8;

completely changes how the rounding and truncation works (+ the typecast still isn't correct) and require extra scrutiny and dialog between who originally wrote those calculations and anyone changing them.


Like I said, I am not qualified enough to comment on the math itself. I am just advising anyone merging any changes to the calculations until the changes being made and the effects they have are completely understood.

I am also not 100% certain whether this is a change which should go forward, but I'm not the one writing the formulas. As shown in the example above, explicit type-casting can also make longer formulas which include typecastings/truncations more obtuse by introducing long chains of extra types and parentheses.

This is just my few cents. In any case, this should be a dialog between the people writing the formulas. It should be their choice on how to proceed.

In any case, no decision should be based on such a superfluous thing as a compiler 'warning' you not to do something. You (should) understand better what you are doing than the compiler does. Therefore, for adding -Werror , my vote is a hard and resounding no. Unfortunately there isn't a perfect solution for this since not only the support for #pragma warning( push ) and pops and suppresses vary between compilers but they also make the code unnecessarily spammier. As there are lot of valid use cases for code which generate warnings and some of the warnings are even downright stupid (like unused-result), the best way is simply being mindful about warnings in your code.

@sw
Copy link
Contributor Author

sw commented Mar 25, 2023

@anzz1 Thanks for your comments.

However, I am not qualified to comment on the math itself. I can only say that changes like these require extra scrutiny, since having correct rounding is an integral part for the models to work properly. round != roundf , sqrt != sqrtf, etc. It is important to acknowledge the difference between number types, rounding and truncation including their effects on the results and speed of the calculations at the bare metal level.

I absolutely agree in principle...

To me it seems that calculations like this are very deliberate
and to correctly explicitly cast it would be
this already shows why explicit typecastings can be more trouble than worth in many cases
however the way you changed it to
completely changes how the rounding and truncation works (+ the typecast still isn't correct)

... but not in this specific case. To prove it, I added a test that will loop through all possible floats (except NaNs and infinities) and verifies that the result is the same.

This should take several seconds, I'll have to check why it isn't properly run on the CI machines. Only the windows machine seems to run the test: 1/3 Test #1: test-double-float ................ Passed 96.65 sec. Maybe the other compilers optimize that away because they recognize the equivalence?

I agree that putting explicit casts can make the code a bit longer, but ggml.c is already in a style where a lot of simple assignments on their own line are used, sometimes for the purpose of type conversions. So I don't think it would make things a lot harder to read in this case.

@anzz1
Copy link
Contributor

anzz1 commented Mar 25, 2023

... but not in this specific case. To prove it, I added a test that will loop through all possible floats (except NaNs and infinities) and verifies that the result is the same.

When you're right, you're right. In this case of just 256 possible values any rounding errors do not materialize. Yeah, I shouldn't comment on the math stuff. 😄

To be honest there is room for improvement, there are some questionable choices like typedef ggml_float double; which are a bit detrimental to readability but then again maybe there is a reason for it like a future expansion on mind or just something I don't get.

Anyway good work and I hope this will get attention from the maths wizards.

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good change. Should have merged it before the refactoring.
If you resolve the conflicts, we can merge it

-Werror is a great idea, but too early to add it. Want to get to a more stable state first

@sw
Copy link
Contributor Author

sw commented Mar 26, 2023

I have resolved the conflicts and looked over the changes again.

I added a test for SILU, but I have disabled the test module to avoid long CI times and high load on the machines.

For GELU there is some difference, but the way I understand it this is an approximation anyway.

@anzz1 made some good points, but you @ggerganov seemed to like it, so I'll consider ready. It certainly requires a close look again.

@sw sw marked this pull request as ready for review March 26, 2023 10:46
@sw sw requested a review from ggerganov March 26, 2023 10:46
const int top_k = params.top_k;
const double top_p = (double)params.top_p;
const double temp = (double)params.temp;
const double repeat_penalty = (double)params.repeat_penalty;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make the params struct have doubles?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to change it too much, but we could alternatively make everything involved with the logits a float instead, except maybe for sum and cumsum in llama_sample_top_p_top_k.

After all, these three parameters are set by the user with 2 decimal places or so...

@anzz1 anzz1 added documentation Improvements or additions to documentation enhancement New feature or request build Compilation issues and removed documentation Improvements or additions to documentation labels Mar 27, 2023
sw added 3 commits March 28, 2023 19:11
Test module is commented out in CMakeLists.txt because the tests may
take a long time, depending on how much the compiler optimizes.
@ggerganov
Copy link
Member

I disabled -Wdouble-promotion for C++ - it's too bad that printf does not support printing float.
The code becomes too cumbersome having to cast all printfs so it's not worth it

There are some code paths that haven't been updated yet, so we have to clear the remaining warnings there in ggml.c.

fprintf(stderr, "%s : calculating perplexity over %d chunks\n", __func__, seq_count);

for (int i = 0; i < seq_count; ++i) {
int start = i * params.n_ctx;
int end = start + params.n_ctx - 1;
int end = start + params.n_ctx - 1; // TODO: this is not optimal, e.g. it makes the batch 511 instead of 512
// it is better to always be power of 2 for better performance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glinscott
Might want to fix this on master.
It's better to compute with powers of 2 for optimal performance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Compilation issues enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants