-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : prevent builds with -ffinite-math-only #7726
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed that gg's changes matches the intent of #7154
The issue that I reported in #7154 (comment) has not been fixed by this PR. For the minimal reproduction I was not using |
Yes, I didn't expect it to be fixed. We need a non-unified KV cache implementation to have deterministic results for |
Could it be a memory barrier issue? x86 guarantees about acquire / release semantics change when you switch to xmm/ymm ops. |
The problem is not a data race or a race condition. Rather the same set of tokens can produce slightly different floating point results, depending on the position they get assigned in the unified KV cache due to the reduce operations in the attention |
btw, this breaks |
This enforces a check that -fno-finite-math-only was set and that the operating
compiling mode is not in finite maths mode. This is because during rewriting of
silu and softmax for cpu #7154 there emerged an issue where the result that was
observed when >1 slot was nondeterministic as found by @JohannesGaessler.
@LostRuins narrowed the problem down to -ffinite-math-only which was theorised
to be due to SiLU, instead of flushing small values to 0, returns NaN or some
other garbage. @jart proposed a fix that @ggerganov then implemented in this fix
ref #7154 (comment)