Be more strict about converting float to double #458

sw · 2023-03-24T09:45:14Z

This enables -Wdouble-promotion and syncs the Makefile and CMakeLists.txt with regards to warnings.

Reasoning:
The llama.cpp codebase depends on the correct use of number types, whether those are float, double or some of the spezialized types such as q4_0. Converting from one type to another should be a concious decision and not happen by accident.

In order to avoid any type promotion warnings, I have updated the code base, sometimes by making an implicit cast explicit, but in some places I did change the actual semantics. I'm not confident at this point that all changes are good.

Consequences:

Inference output changes, though I wouldn't say for the worse. Perplexity has an ETA of 20 hours on my machine, so I haven't run that yet.
q4_0 quantization is identical.

Further steps if and when this PR is merged:

Enable -Werror?

sw · 2023-03-24T09:57:27Z

clang on macOS is apparently stricter, I'll clean this up using the warnings from the CI run.

I'm not sure if the double precision is needed in ggml_compute_forward_rope_f32/_f16.

ggml.c

Green-Sky · 2023-03-24T11:01:25Z

i assume inference speed changes will be minimal, and only really a thing with simd disabled?

sw · 2023-03-24T11:44:35Z

i assume inference speed changes will be minimal, and only really a thing with simd disabled?

I believe master got a bit slower recently, but I can't detect a regression with this PR and AVX2 optimizations enabled.

anzz1 · 2023-03-25T11:39:32Z

I somewhat agree with:

Typecastings in most cases should be explicit to denote conscious choices, however I understand that there can be a preference of not using them when you understand what you are doing (explicit can be easier for others to read and understand, but not always as in longer formulas it also can make it harder to read, and it always requires more typing)
1.0D and 1.0f is better than 1.0

However, I am not qualified to comment on the math itself. I can only say that changes like these require extra scrutiny, since having correct rounding is an integral part for the models to work properly. round != roundf , sqrt != sqrtf, etc. It is important to acknowledge the difference between number types, rounding and truncation including their effects on the results and speed of the calculations at the bare metal level.

One example:

3afbe43#diff-6d9ce99fcb6f51ff76f59e479f6e6fc0bb62edef7442805d7a5bb15b23996b5dL482

To me it seems that calculations like this are very deliberate

const float v0 = x[i*QK + l + 0]*id;
const float v1 = x[i*QK + l + 1]*id;
const uint8_t vi0 = ((int8_t) (round(v0))) + 8;
const uint8_t vi1 = ((int8_t) (round(v1))) + 8;

and to correctly explicitly cast it would be

const uint8_t vi0 = (uint8_t)(((int8_t)(round((double)v0))) + 8);

this already shows why explicit typecastings can be more trouble than worth in many cases

however the way you changed it to

const uint8_t vi0 = (int8_t)roundf(v0) + 8;

completely changes how the rounding and truncation works (+ the typecast still isn't correct) and require extra scrutiny and dialog between who originally wrote those calculations and anyone changing them.

Like I said, I am not qualified enough to comment on the math itself. I am just advising anyone merging any changes to the calculations until the changes being made and the effects they have are completely understood.

I am also not 100% certain whether this is a change which should go forward, but I'm not the one writing the formulas. As shown in the example above, explicit type-casting can also make longer formulas which include typecastings/truncations more obtuse by introducing long chains of extra types and parentheses.

This is just my few cents. In any case, this should be a dialog between the people writing the formulas. It should be their choice on how to proceed.

In any case, no decision should be based on such a superfluous thing as a compiler 'warning' you not to do something. You (should) understand better what you are doing than the compiler does. Therefore, for adding -Werror , my vote is a hard and resounding no. Unfortunately there isn't a perfect solution for this since not only the support for #pragma warning( push ) and pops and suppresses vary between compilers but they also make the code unnecessarily spammier. As there are lot of valid use cases for code which generate warnings and some of the warnings are even downright stupid (like unused-result), the best way is simply being mindful about warnings in your code.

sw · 2023-03-25T16:11:39Z

@anzz1 Thanks for your comments.

However, I am not qualified to comment on the math itself. I can only say that changes like these require extra scrutiny, since having correct rounding is an integral part for the models to work properly. round != roundf , sqrt != sqrtf, etc. It is important to acknowledge the difference between number types, rounding and truncation including their effects on the results and speed of the calculations at the bare metal level.

I absolutely agree in principle...

To me it seems that calculations like this are very deliberate
and to correctly explicitly cast it would be
this already shows why explicit typecastings can be more trouble than worth in many cases
however the way you changed it to
completely changes how the rounding and truncation works (+ the typecast still isn't correct)

... but not in this specific case. To prove it, I added a test that will loop through all possible floats (except NaNs and infinities) and verifies that the result is the same.

This should take several seconds, I'll have to check why it isn't properly run on the CI machines. Only the windows machine seems to run the test: 1/3 Test #1: test-double-float ................ Passed 96.65 sec. Maybe the other compilers optimize that away because they recognize the equivalence?

I agree that putting explicit casts can make the code a bit longer, but ggml.c is already in a style where a lot of simple assignments on their own line are used, sometimes for the purpose of type conversions. So I don't think it would make things a lot harder to read in this case.

anzz1 · 2023-03-25T21:16:48Z

... but not in this specific case. To prove it, I added a test that will loop through all possible floats (except NaNs and infinities) and verifies that the result is the same.

When you're right, you're right. In this case of just 256 possible values any rounding errors do not materialize. Yeah, I shouldn't comment on the math stuff. 😄

To be honest there is room for improvement, there are some questionable choices like typedef ggml_float double; which are a bit detrimental to readability but then again maybe there is a reason for it like a future expansion on mind or just something I don't get.

Anyway good work and I hope this will get attention from the maths wizards.

ggerganov

This is a good change. Should have merged it before the refactoring.
If you resolve the conflicts, we can merge it

-Werror is a great idea, but too early to add it. Want to get to a more stable state first

sw · 2023-03-26T10:46:24Z

I have resolved the conflicts and looked over the changes again.

I added a test for SILU, but I have disabled the test module to avoid long CI times and high load on the machines.

For GELU there is some difference, but the way I understand it this is an approximation anyway.

@anzz1 made some good points, but you @ggerganov seemed to like it, so I'll consider ready. It certainly requires a close look again.

examples/common.cpp

j-f1 · 2023-03-26T13:14:53Z

examples/main/main.cpp

+            const int top_k             = params.top_k;
+            const double top_p          = (double)params.top_p;
+            const double temp           = (double)params.temp;
+            const double repeat_penalty = (double)params.repeat_penalty;


Why not make the params struct have doubles?

I didn't want to change it too much, but we could alternatively make everything involved with the logits a float instead, except maybe for sum and cumsum in llama_sample_top_p_top_k.

After all, these three parameters are set by the user with 2 decimal places or so...

Test module is commented out in CMakeLists.txt because the tests may take a long time, depending on how much the compiler optimizes.

ggml.c

ggerganov · 2023-03-28T16:34:57Z

I disabled -Wdouble-promotion for C++ - it's too bad that printf does not support printing float.
The code becomes too cumbersome having to cast all printfs so it's not worth it

There are some code paths that haven't been updated yet, so we have to clear the remaining warnings there in ggml.c.

ggerganov · 2023-03-28T16:35:58Z

examples/perplexity/perplexity.cpp

    fprintf(stderr, "%s : calculating perplexity over %d chunks\n", __func__, seq_count);

    for (int i = 0; i < seq_count; ++i) {
        int start = i * params.n_ctx;
-        int end = start + params.n_ctx - 1;
+        int end = start + params.n_ctx - 1; // TODO: this is not optimal, e.g. it makes the batch 511 instead of 512
+                                            //       it is better to always be power of 2 for better performance


@glinscott
Might want to fix this on master.
It's better to compute with powers of 2 for optimal performance

sw force-pushed the warn-double branch from 8080e41 to 506dcfb Compare March 24, 2023 09:49

Green-Sky reviewed Mar 24, 2023

View reviewed changes

ggml.c Show resolved Hide resolved

Green-Sky reviewed Mar 24, 2023

View reviewed changes

ggml.c Outdated Show resolved Hide resolved

sw force-pushed the warn-double branch from 3135676 to 21a2b7f Compare March 25, 2023 07:31

sw force-pushed the warn-double branch from 21a2b7f to 7e051be Compare March 25, 2023 16:01

ggerganov approved these changes Mar 25, 2023

View reviewed changes

sw force-pushed the warn-double branch from 7e051be to 57e2a89 Compare March 26, 2023 10:26

sw marked this pull request as ready for review March 26, 2023 10:46

sw requested a review from ggerganov March 26, 2023 10:46

j-f1 reviewed Mar 26, 2023

View reviewed changes

anzz1 added documentation Improvements or additions to documentation enhancement New feature or request build Compilation issues and removed documentation Improvements or additions to documentation labels Mar 27, 2023

sw added 3 commits March 28, 2023 19:11

Be more strict about converting float to double

54b75a7

Test equivalence of round, SILU implementations

3a42193

Test module is commented out in CMakeLists.txt because the tests may take a long time, depending on how much the compiler optimizes.

Fix softmax in perplexity.cpp

f68345e

ggerganov reviewed Mar 28, 2023

View reviewed changes

ggml.c Show resolved Hide resolved

all : prefer float over double where appropriate

61733d3

ggerganov force-pushed the warn-double branch from adc63ad to 61733d3 Compare March 28, 2023 16:32

ggerganov reviewed Mar 28, 2023

View reviewed changes

perplexity : add <cmath>

21e9ce7

ggerganov force-pushed the warn-double branch from 8a22fef to 21e9ce7 Compare March 28, 2023 16:45

ggerganov merged commit 436e561 into ggml-org:master Mar 28, 2023

sw deleted the warn-double branch March 28, 2023 19:24

rabidcopy mentioned this pull request Mar 29, 2023

Add support for memory mapping models #586

Closed

4 tasks

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Be more strict about converting float to double #458

Be more strict about converting float to double #458

sw commented Mar 24, 2023

sw commented Mar 24, 2023 •

edited

Loading

Green-Sky commented Mar 24, 2023

sw commented Mar 24, 2023

anzz1 commented Mar 25, 2023 •

edited

Loading

sw commented Mar 25, 2023 •

edited

Loading

anzz1 commented Mar 25, 2023

ggerganov left a comment •

edited

Loading

sw commented Mar 26, 2023

j-f1 Mar 26, 2023

sw Mar 26, 2023

ggerganov commented Mar 28, 2023

ggerganov Mar 28, 2023

Be more strict about converting float to double #458

Be more strict about converting float to double #458

Conversation

sw commented Mar 24, 2023

sw commented Mar 24, 2023 • edited Loading

Green-Sky commented Mar 24, 2023

sw commented Mar 24, 2023

anzz1 commented Mar 25, 2023 • edited Loading

sw commented Mar 25, 2023 • edited Loading

anzz1 commented Mar 25, 2023

ggerganov left a comment • edited Loading

Choose a reason for hiding this comment

sw commented Mar 26, 2023

j-f1 Mar 26, 2023

Choose a reason for hiding this comment

sw Mar 26, 2023

Choose a reason for hiding this comment

ggerganov commented Mar 28, 2023

ggerganov Mar 28, 2023

Choose a reason for hiding this comment

sw commented Mar 24, 2023 •

edited

Loading

anzz1 commented Mar 25, 2023 •

edited

Loading

sw commented Mar 25, 2023 •

edited

Loading

ggerganov left a comment •

edited

Loading