Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : prevent builds with -ffinite-math-only #7726

Merged
merged 3 commits into from
Jun 4, 2024

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Jun 4, 2024

This enforces a check that -fno-finite-math-only was set and that the operating
compiling mode is not in finite maths mode. This is because during rewriting of
silu and softmax for cpu #7154 there emerged an issue where the result that was
observed when >1 slot was nondeterministic as found by @JohannesGaessler.

@LostRuins narrowed the problem down to -ffinite-math-only which was theorised
to be due to SiLU, instead of flushing small values to 0, returns NaN or some
other garbage. @jart proposed a fix that @ggerganov then implemented in this fix

ref #7154 (comment)

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jun 4, 2024
Copy link
Collaborator

@mofosyne mofosyne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that gg's changes matches the intent of #7154

@mofosyne mofosyne added Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix merge ready indicates that this may be ready to merge soon and is just holding out in case of objections labels Jun 4, 2024
@mofosyne mofosyne merged commit 6d16169 into master Jun 4, 2024
66 of 71 checks passed
@github-actions github-actions bot added the build Compilation issues label Jun 4, 2024
Copy link
Contributor

github-actions bot commented Jun 4, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 552 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8540.34ms p(95)=20501.86ms fails=, finish reason: stop=495 truncated=57
  • Prompt processing (pp): avg=99.34tk/s p(95)=453.53tk/s
  • Token generation (tg): avg=46.88tk/s p(95)=45.53tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=gg/error-on-finite-math-only commit=771cc3a6b4425b02bcb88f07aba74d685cddb018

prompt_tokens_seconds

More
Loading
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 552 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1717489015 --> 1717489645
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 531.88, 531.88, 531.88, 531.88, 531.88, 518.27, 518.27, 518.27, 518.27, 518.27, 522.93, 522.93, 522.93, 522.93, 522.93, 574.92, 574.92, 574.92, 574.92, 574.92, 647.41, 647.41, 647.41, 647.41, 647.41, 648.6, 648.6, 648.6, 648.6, 648.6, 671.43, 671.43, 671.43, 671.43, 671.43, 690.53, 690.53, 690.53, 690.53, 690.53, 692.81, 692.81, 692.81, 692.81, 692.81, 711.66, 711.66, 711.66, 711.66, 711.66, 736.93, 736.93, 736.93, 736.93, 736.93, 776.88, 776.88, 776.88, 776.88, 776.88, 781.33, 781.33, 781.33, 781.33, 781.33, 811.32, 811.32, 811.32, 811.32, 811.32, 825.76, 825.76, 825.76, 825.76, 825.76, 828.18, 828.18, 828.18, 828.18, 828.18, 824.39, 824.39, 824.39, 824.39, 824.39, 831.22, 831.22, 831.22, 831.22, 831.22, 838.33, 838.33, 838.33, 838.33, 838.33, 845.32, 845.32, 845.32, 845.32, 845.32, 845.12, 845.12, 845.12, 845.12, 845.12, 849.05, 849.05, 849.05, 849.05, 849.05, 863.26, 863.26, 863.26, 863.26, 863.26, 854.68, 854.68, 854.68, 854.68, 854.68, 856.07, 856.07, 856.07, 856.07, 856.07, 865.48, 865.48, 865.48, 865.48, 865.48, 865.75, 865.75, 865.75, 865.75, 865.75, 865.85, 865.85, 865.85, 865.85, 865.85, 869.2, 869.2, 869.2, 869.2, 869.2, 868.87, 868.87, 868.87, 868.87, 868.87, 867.6, 867.6, 867.6, 867.6, 867.6, 869.35, 869.35, 869.35, 869.35, 869.35, 878.27, 878.27, 878.27, 878.27, 878.27, 881.61, 881.61, 881.61, 881.61, 881.61, 885.27, 885.27, 885.27, 885.27, 885.27, 881.76, 881.76, 881.76, 881.76, 881.76, 880.08, 880.08, 880.08, 880.08, 880.08, 883.55, 883.55, 883.55, 883.55, 883.55, 885.13, 885.13, 885.13, 885.13, 885.13, 894.87, 894.87, 894.87, 894.87, 894.87, 897.14, 897.14, 897.14, 897.14, 897.14, 890.78, 890.78, 890.78, 890.78, 890.78, 875.36, 875.36, 875.36, 875.36, 875.36, 874.06, 874.06, 874.06, 874.06, 874.06, 876.83, 876.83, 876.83, 876.83, 876.83, 879.01, 879.01, 879.01, 879.01, 879.01, 877.48, 877.48, 877.48, 877.48, 877.48, 874.25, 874.25, 874.25, 874.25, 874.25, 878.36, 878.36, 878.36, 878.36, 878.36, 877.88, 877.88, 877.88, 877.88, 877.88, 878.73, 878.73, 878.73, 878.73, 878.73, 868.25, 868.25, 868.25, 868.25, 868.25, 867.58, 867.58, 867.58, 867.58, 867.58, 859.32, 859.32, 859.32, 859.32, 859.32, 859.89, 859.89, 859.89, 859.89, 859.89, 860.04, 860.04, 860.04, 860.04, 860.04, 860.85, 860.85, 860.85, 860.85, 860.85, 862.06, 862.06, 862.06, 862.06, 862.06, 862.04, 862.04, 862.04, 862.04, 862.04, 864.67, 864.67, 864.67, 864.67, 864.67, 863.67, 863.67, 863.67, 863.67, 863.67, 863.67]
                    
predicted_tokens_seconds
More
Loading
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 552 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1717489015 --> 1717489645
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 38.68, 38.68, 38.68, 38.68, 38.68, 39.5, 39.5, 39.5, 39.5, 39.5, 30.42, 30.42, 30.42, 30.42, 30.42, 33.48, 33.48, 33.48, 33.48, 33.48, 34.72, 34.72, 34.72, 34.72, 34.72, 34.11, 34.11, 34.11, 34.11, 34.11, 34.19, 34.19, 34.19, 34.19, 34.19, 34.45, 34.45, 34.45, 34.45, 34.45, 34.67, 34.67, 34.67, 34.67, 34.67, 34.21, 34.21, 34.21, 34.21, 34.21, 33.95, 33.95, 33.95, 33.95, 33.95, 33.81, 33.81, 33.81, 33.81, 33.81, 33.35, 33.35, 33.35, 33.35, 33.35, 32.54, 32.54, 32.54, 32.54, 32.54, 31.12, 31.12, 31.12, 31.12, 31.12, 30.56, 30.56, 30.56, 30.56, 30.56, 29.96, 29.96, 29.96, 29.96, 29.96, 30.16, 30.16, 30.16, 30.16, 30.16, 30.05, 30.05, 30.05, 30.05, 30.05, 30.12, 30.12, 30.12, 30.12, 30.12, 30.15, 30.15, 30.15, 30.15, 30.15, 30.48, 30.48, 30.48, 30.48, 30.48, 30.62, 30.62, 30.62, 30.62, 30.62, 30.59, 30.59, 30.59, 30.59, 30.59, 30.96, 30.96, 30.96, 30.96, 30.96, 31.02, 31.02, 31.02, 31.02, 31.02, 30.95, 30.95, 30.95, 30.95, 30.95, 31.21, 31.21, 31.21, 31.21, 31.21, 31.46, 31.46, 31.46, 31.46, 31.46, 31.57, 31.57, 31.57, 31.57, 31.57, 31.78, 31.78, 31.78, 31.78, 31.78, 31.86, 31.86, 31.86, 31.86, 31.86, 31.65, 31.65, 31.65, 31.65, 31.65, 31.61, 31.61, 31.61, 31.61, 31.61, 31.34, 31.34, 31.34, 31.34, 31.34, 31.27, 31.27, 31.27, 31.27, 31.27, 31.33, 31.33, 31.33, 31.33, 31.33, 31.4, 31.4, 31.4, 31.4, 31.4, 31.54, 31.54, 31.54, 31.54, 31.54, 31.66, 31.66, 31.66, 31.66, 31.66, 31.46, 31.46, 31.46, 31.46, 31.46, 31.24, 31.24, 31.24, 31.24, 31.24, 30.92, 30.92, 30.92, 30.92, 30.92, 29.74, 29.74, 29.74, 29.74, 29.74, 29.63, 29.63, 29.63, 29.63, 29.63, 29.59, 29.59, 29.59, 29.59, 29.59, 29.41, 29.41, 29.41, 29.41, 29.41, 29.43, 29.43, 29.43, 29.43, 29.43, 29.51, 29.51, 29.51, 29.51, 29.51, 29.5, 29.5, 29.5, 29.5, 29.5, 29.43, 29.43, 29.43, 29.43, 29.43, 29.41, 29.41, 29.41, 29.41, 29.41, 29.34, 29.34, 29.34, 29.34, 29.34, 29.36, 29.36, 29.36, 29.36, 29.36, 29.35, 29.35, 29.35, 29.35, 29.35, 29.51, 29.51, 29.51, 29.51, 29.51, 29.66, 29.66, 29.66, 29.66, 29.66, 29.76, 29.76, 29.76, 29.76, 29.76, 29.82, 29.82, 29.82, 29.82, 29.82, 29.87, 29.87, 29.87, 29.87, 29.87, 29.89, 29.89, 29.89, 29.89, 29.89, 30.06]
                    

Details

kv_cache_usage_ratio

More
Loading
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 552 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1717489015 --> 1717489645
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.13, 0.13, 0.13, 0.13, 0.13, 0.37, 0.37, 0.37, 0.37, 0.37, 0.11, 0.11, 0.11, 0.11, 0.11, 0.13, 0.13, 0.13, 0.13, 0.13, 0.21, 0.21, 0.21, 0.21, 0.21, 0.23, 0.23, 0.23, 0.23, 0.23, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.2, 0.2, 0.2, 0.2, 0.2, 0.26, 0.26, 0.26, 0.26, 0.26, 0.28, 0.28, 0.28, 0.28, 0.28, 0.31, 0.31, 0.31, 0.31, 0.31, 0.23, 0.23, 0.23, 0.23, 0.23, 0.38, 0.38, 0.38, 0.38, 0.38, 0.39, 0.39, 0.39, 0.39, 0.39, 0.35, 0.35, 0.35, 0.35, 0.35, 0.13, 0.13, 0.13, 0.13, 0.13, 0.2, 0.2, 0.2, 0.2, 0.2, 0.18, 0.18, 0.18, 0.18, 0.18, 0.28, 0.28, 0.28, 0.28, 0.28, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.27, 0.27, 0.27, 0.27, 0.27, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.12, 0.12, 0.12, 0.12, 0.12, 0.17, 0.17, 0.17, 0.17, 0.17, 0.31, 0.31, 0.31, 0.31, 0.31, 0.3, 0.3, 0.3, 0.3, 0.3, 0.26, 0.26, 0.26, 0.26, 0.26, 0.21, 0.21, 0.21, 0.21, 0.21, 0.09, 0.09, 0.09, 0.09, 0.09, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.29, 0.29, 0.29, 0.29, 0.29, 0.5, 0.5, 0.5, 0.5, 0.5, 0.55, 0.55, 0.55, 0.55, 0.55, 0.5, 0.5, 0.5, 0.5, 0.5, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.31, 0.31, 0.31, 0.31, 0.31, 0.28, 0.28, 0.28, 0.28, 0.28, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.27, 0.27, 0.27, 0.27, 0.27, 0.18, 0.18, 0.18, 0.18, 0.18, 0.25, 0.25, 0.25, 0.25, 0.25, 0.17, 0.17, 0.17, 0.17, 0.17, 0.26, 0.26, 0.26, 0.26, 0.26, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.21, 0.21, 0.21, 0.21, 0.21, 0.16, 0.16, 0.16, 0.16, 0.16, 0.25]
                    
requests_processing
More
Loading
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 552 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1717489015 --> 1717489645
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 0.0]
                    

@JohannesGaessler
Copy link
Collaborator

JohannesGaessler commented Jun 4, 2024

The issue that I reported in #7154 (comment) has not been fixed by this PR. For the minimal reproduction I was not using LLAMA_FAST and therefore not -ffinite-math-only anyways.

@ggerganov
Copy link
Member Author

Yes, I didn't expect it to be fixed. We need a non-unified KV cache implementation to have deterministic results for n_slots > 1

@jart
Copy link
Contributor

jart commented Jun 4, 2024

Could it be a memory barrier issue? x86 guarantees about acquire / release semantics change when you switch to xmm/ymm ops.

@mofosyne mofosyne deleted the gg/error-on-finite-math-only branch June 5, 2024 01:38
@ggerganov
Copy link
Member Author

The problem is not a data race or a race condition. Rather the same set of tokens can produce slightly different floating point results, depending on the position they get assigned in the unified KV cache due to the reduce operations in the attention

@yunginnanet
Copy link

btw, this breaks LLAMA_FAST=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Compilation issues ggml changes relating to the ggml tensor library for machine learning merge ready indicates that this may be ready to merge soon and is just holding out in case of objections Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants