|
1 | 1 | OpenBLAS ChangeLog
|
| 2 | +==================================================================== |
| 3 | +Version 0.3.29 |
| 4 | +12-Jan-2025 |
| 5 | + |
| 6 | +general: |
| 7 | + - fixed a potential NULL pointer dereference in multithreaded builds |
| 8 | + - added function aliases for GEMMT using its new name GEMMTR adopted by Reference-BLAS |
| 9 | + - fixed a build failure when building without LAPACK_DEPRECATED functions |
| 10 | + - the minimum required CMake version for CMake-based builds was raised to 3.16.0 in order |
| 11 | + to remove many compatibility and deprecation warnings |
| 12 | + - added more detailed CMake rules for OpenMP builds (mainly to support recent LLVM) |
| 13 | + - fixed the behavior of the recently added CBLAS_?GEMMT functions with row-major data |
| 14 | + - improved thread scaling of multithreaded SBGEMV |
| 15 | + - improved thread scaling of multithreaded TRTRI |
| 16 | + - fixed compilation of the CBLAS testsuite with gcc14 (and no Fortran compiler) |
| 17 | + - added support for option handling changes in flang-new from LLVM18 onwards |
| 18 | + - added support for recent calling conventions changes in Cray and NVIDIA compilers |
| 19 | + - added support for compilation with the NAG Fortran compiler |
| 20 | + - fixed placement of the -fopenmp flag and libsuffix in the generated pkgconfig file |
| 21 | + - improved the CMakeConfig file generated by the Makefile build |
| 22 | + - fixed const-correctness of cblas_?geadd in cblas.h |
| 23 | + - fixed a potential inaccuracy in multithreaded BLAS3 calls |
| 24 | + - fixed empty implementations of get/set_affinity that print a warning in OpenMP builds |
| 25 | + - fixed function signatures for TRTRS in the converted C version of LAPACK |
| 26 | + - fixed omission of several single-precision LAPACK symbols in the shared library |
| 27 | + - improved build instructions for the provided "pybench" benchmarks |
| 28 | + - improved documentation, including added build instructions for WoA and HarmonyOS |
| 29 | + as well as descriptions of environment variables that affect build and runtime behavior |
| 30 | + - added a separate "make install_tests" target for use with cross-compilations |
| 31 | + - integrated improvements and corrections from Reference-LAPACK: |
| 32 | + - removed a comparison in LAPACKE ?tpmqrt that is always false (LAPACK PR 1062) |
| 33 | + - fixed the leading dimension for B in tests for GGEV (LAPACK PR 1064) |
| 34 | + - replaced the ?LARFT functions with a recursive implementation (LAPACK PR 1080) |
| 35 | + |
| 36 | +arm: |
| 37 | + - fixed build with recent versions of the NDK (missing .type declaration of symbols) |
| 38 | + |
| 39 | +arm64: |
| 40 | + - fixed a long-standing bug in the (generic) c/zgemm_beta kernel that could lead to |
| 41 | + reads and writes outside the array bounds in some circumstances |
| 42 | + - rewrote cpu autodetection to scan all cores and return the highest performing type |
| 43 | + - improved the DGEMM performance for SVE targets and small matrix sizes |
| 44 | + - improved dimension criteria for forwarding from GEMM to GEMV kernels |
| 45 | + - added SVE kernels for ROT and SWAP |
| 46 | + - improved SVE kernels for SGEMV and DGEMV on A64FX and NEOVERSEV1 |
| 47 | + - added support for using the "small matrix" kernels with CMake as well |
| 48 | + - fixed compilation on Windows on Arm |
| 49 | + - improved compile-time detection of SVE capability |
| 50 | + - added cpu autodetection and initial support for Apple M4 |
| 51 | + - added support for compilation on systems running IOS |
| 52 | + - added support for compilation on NetBSD ("evbarm" architecture) |
| 53 | + - fixed NRM2 implementations for generic SVE targets and the Neoverse N2 |
| 54 | + - fixed compilation for SVE-capable targets with the NVIDIA compiler |
| 55 | + |
| 56 | +x86_64: |
| 57 | + - fixed a wrong storage size in the SBGEMV kernel for Cooper Lake |
| 58 | + - added cpu autodetection for Intel Granite Rapids |
| 59 | + - added cpu autodetection for AMD Ryzen 5 series |
| 60 | + - added optimized SOMATCOPY_CT for AVX-capable targets |
| 61 | + - fixed the fallback implementation of GEMM3M in GENERIC builds |
| 62 | + - tentatively re-enabled builds with the EXPRECISION option |
| 63 | + - worked around a miscompilation of tests with mingw32-gfortran14 |
| 64 | + - added support for compilation with the Intel oneAPI 2025.0 compiler on Windows |
| 65 | + |
| 66 | +power: |
| 67 | + - fixed multithreaded SBGEMM |
| 68 | + - fixed a CMake build problem on POWER10 |
| 69 | + - improved the performance of SGEMV |
| 70 | + - added vectorized implementations of SBGEMV and support for forwarding 1xN SBGEMM to them |
| 71 | + - fixed illegal instructions and potential memory overflow in SGEMM on PPCG4 |
| 72 | + - fixed handling of NaN and Inf arguments in SSCAL and DSCAL on PPC440,G4 and 970 |
| 73 | + - added improved CGEMM and ZGEMM kernels for POWER10 |
| 74 | + - added Makefile logic to remove all optimization flags in DEBUG builds |
| 75 | + |
| 76 | +mips64: |
| 77 | + - fixed compilation with gcc14 |
| 78 | + - fixed GEMM parameter selection for the MIPS64_GENERIC target |
| 79 | + - fixed a potential build failure when compiling with OpenMP |
| 80 | + |
| 81 | +loongarch64: |
| 82 | + - fixed compilation for Loongson3 with recent versions of gmake |
| 83 | + - fixed a potential loss of precision in Loongson3A GEMM |
| 84 | + - fixed a potential build failure when compiling with OpenMP |
| 85 | + - added optimized SOMATCOPY for LASX-capable targets |
| 86 | + - introduced a new cpu naming scheme while retaining compatibility |
| 87 | + - added support for cross-compiling Loongarch64 targets with CMake |
| 88 | + - added support for compilation with LLVM |
| 89 | + |
| 90 | +riscv64: |
| 91 | + - removed thread yielding overhead caused by sched_yield |
| 92 | + - replaced some non-standard intrinsics with their official names |
| 93 | + - fixed and sped up the implementations of CGEMM/ZGEMM TCOPY for vector lenghts 128 and 256 |
| 94 | + - improved the performance of SNRM2/DNRM2 for RVV1.0 targets |
| 95 | + - added optimized ?OMATCOPY_CN kernels for RVV1.0 targets |
| 96 | + |
2 | 97 | ====================================================================
|
3 | 98 | Version 0.3.28
|
4 | 99 | 8-Aug-2024
|
|
0 commit comments