Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dgemv_kernel_4x4(Haswell): add missing clobbers for xmm0,xmm1,xmm2,xmm3 #2018

Merged

Conversation

bartoldeman
Copy link
Contributor

This fixes a crash in dblat2 when OpenBLAS is compiled using
-march=znver1 -ftree-vectorize -O2

See also:
easybuilders/easybuild-easyconfigs#7180

This fixes a crash in dblat2 when OpenBLAS is compiled using
-march=znver1 -ftree-vectorize -O2

See also:
easybuilders/easybuild-easyconfigs#7180
@martin-frbg
Copy link
Collaborator

Thanks. Yet another error pattern to look for ... but at first glance it appears to be a unique oversight in #561. gcc >=8 is really bringing out the worst traits of the code it seems...

@martin-frbg martin-frbg added this to the 0.3.6 milestone Feb 14, 2019
@martin-frbg
Copy link
Collaborator

FYI compiling with -ftree-vectorize -O2 (on Kaby Lake, i.e. Haswell kernels) leads to a "less than half accurate" warning for SGEMV so there appears to be more such niceness. (Possibly related to my misgivings about input arguments beyond %0,%1 getting modified #2009)

@martin-frbg martin-frbg merged commit cd5a59b into OpenMathLib:develop Feb 14, 2019
@martin-frbg
Copy link
Collaborator

The SGEMV mishandling does indeed appear to come from abusing one of the input parameters (%8 a.k.a. "lda4") of sgemv_kernel_4x8 in sgemv_n_microk_haswell-4.c . I did not manage to come up with a proper constraint, so my current workaround is to "movq" the value from %8 into the otherwise unused %xmm10 at the start and restore it from there - not sure if that is a sane solution...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants