Skip to content

Commit eb11949

Browse files
gquintinLénaïc BagnèresLénaïc BagnèresPaul Gannayc
authoredDec 11, 2020
Merge for v2 (#77)
* Start tet1d module * Update tet1d module * Add CUDA support for tet1d module * Add scalar input support in tet1d module * simplify command line arguments for hatch.py * Add support for generation of modules * Make tet1d a module into the new system * Add scalar version of some function from libm + reinterpret * Fix refactoring * Before merging master * Fixes after merge * For backup * Fix CUDA * Add forgotten files * Working => backup * Fixes * All tests are passing f16 included * For backup * COVID-19 * Fixes * Fixes * Fixes: all test compile with nvcc * ROCm support, addition on f32 and f16 are compiling * TET1D tests are compiling with both nvcc and hipcc * Merge CUDA and ROCm when code is the same * Forgot files * Now we can list generated files * Forgot to merge nsimd.h * Forgot to push * Update .gitignore with the new file generated by the tet1d module. * Return allocated arrays for tests * Increase the minimum size of the tests array * Fix segfault * Fix segfault * Add mask[oz]_load[zu] and mask_store[au] operators for CPU * For backup * For backup * Fix for SSE * Fix fma for C89 * Remove warning from GCC when using long long in C98 and C++98 * Fix warnings for C98 and C++98 and AVX512 * Add set1l, iota, mask_for_loop_tail for ARM * Before merging master * Fix ARM mask[oz]_load[au] * Fixes for ARM SVE * Fix warning when using __f16's * Add alignment-templated masked loads/stores * Rewrite friendly_but_not_optimized stuff * Forgot file * Fix ARM * Fix ARM * Cosmetic * Backup * Backup * Backup * Backup * Forgot file * For backup * For backup * Refactoring of documentation * Add build.nsconfig + fix warning in fixed_point exp * Fix warning in SPMd module * Add forgotten file * Fixes for CUDA * Fixes for CPU * Fixes * Add gather/scatter for cpu and x86 * Add gather/scatter for arm (not tested yet) * Fix gather/scatter for arm * Deactivate tet1d module * Cleanup * Add scripts for building * Fix setup and build script for Linux * Changing computer * Backup * Fix script/setup.sh * Fixes for fixed size SVE * Fix Windows scripts * Fix scripts for Linux * Fix Makefile.nix for md2html * Fix Makefile.win for md2html * Fix generation of documentation * Add mask scatter for cpu * Add mask_scatter for x86 * Forgot a file * Add mask_scatter for arm * Add masked gather for cpu * Add masked gather for x86 * Add masked gather for arm * Fix masked gather for f16's * Adapt SVE typedefs to new GCC 10 * Fixes for x86 * Fix tet1d tests for CUDA * Fixes for HIP * Fix warning fr ROCm/HIP * Various fixes * Fix tests for rec11, rec8, rsqrt11 and rsqrt8 * Fix rec11, rec8, rsqrt11, rsqrt8 tests * Improve gather/scatter for neon128 and aarch64 * Add gather_linear + scatter_linear and remove masked gather and scatter * Add linear gather + scatter * Fix gather_linear for neon128 + aarch64 * Improve gather on aarch64 + neon128 * Add documentation for module TET1d * Update README * Add documentation for module TET1d * Improve README with nsconfig stuff * Improve README * Improve README * Improve README * Improve README * Fix warning for armclang * Fix warning when compiling with Clang and C++98/03 * Fix generation of benches * For backup * First version (not finished yet) * Add support for non closed operators * Improve doc * Improve documentation * More fixes * Fix broken link in README * Add CONTRIBUTING.md * Improve documentation * Improve documentation * Improve documentation + simplify scoped_aligned_mem_for * Fix scoped_aligned_mem * Fixed errors in nsimd.h * Improve documentation * Improve documentation * Improve documentation * Replace some print left by common.myprint * Fixed multiple declarations * Let benches generate despite the new function set1l * Add a module offering a vectorized random generator * Only generate rand module if flags passed from hatch are correct * Removed F-strings * Fix build.nsconfig * Fix generation of rand module * Building the library does not require C++14 anymore, C++98 is more than sufficient * Update README * Update README * Setup.sh clone nstools using the same protocol as nsimd * Add possibility to ignore tests/benches/... * Add C++20 concepts to nsimd.h * Add C++20 concepts to cxx_adv_api.hpp * Add C++20 concepts to Python-generated functions * Fix C++20 concepts * Prepare support for oneAPI * Add C++20 concepts doc * Modify the rand module to allow generation with python 3.5 and earlier * Improve doc + rename module rand --> random * Fix menu of doc of random module * Fix availability of scoped_mem... * Fix tests to_pack* * Tests are dependant of the SIMD architecture * Improvements for Intel + Fixes for KNL * More fixes for KNL and C89 * More fixes * Fix fms/fnms for aarch64 * Fixes for SVE * Fix warning whe compiling for 32-bits targets * Cleaning in tests generation * Fix ULP bounds for some operators * Almost all tests are passing on 32-bits platform * No more warning for 32-bits compilations * Forgot a file * Fix last errors in philox * First version of quick'n'dirty CI * Fix warnings * Fix more warnings * Fix Pyhon generation for module/random * Fix fnms for SSE2 and SSE42 * Try again to fix warnings for GCC * Fix warnings for Clang * Add variable to compile for a given CUDA GPU * Fix warnings for ROCm/HIP * Fix CUDA f16 implementation * Fix CUDA f16 implementation * Fix CUDA f16 implementation * Reduce size of arrays for GPU testing * Reduce size of arrays for GPU testing * Compile .so with nvcc and hipcc for binary compatibility * Fix build.nsconfig * Fix build.nsconfig * Fix build.nsconfig * Fix build.nsconfig * Improve CI script + add static in NSIMD_INLINE * Fix build.nsconfig for HIP * Last fixes * Fix issue: __popcnt64 not available in 32-bits mode * Fix DLL specifier of *logulps* * Fix MSVC 32-bits related issues * Cosmetic * Add __vectorcall for MSVC 32-bits * Update .gitignore Co-authored-by: Lénaïc Bagnères <[email protected]> Co-authored-by: Lénaïc Bagnères <[email protected]> Co-authored-by: Paul Gannay <[email protected]> Co-authored-by: c <[email protected]> Co-authored-by: Adrien Arnaud <[email protected]> Co-authored-by: Rodolphe Cargnello <[email protected]>
1 parent df84e57 commit eb11949

File tree

113 files changed

+17285
-9171
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

113 files changed

+17285
-9171
lines changed
 

Diff for: ‎.clang-format

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Standard: Cpp03
2+
ColumnLimit: 79

Diff for: ‎.gitignore

+38-7
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,65 @@
1-
## Build system
2-
build
1+
# Common build dirs
2+
build*/
33

4-
## Auto-generated
4+
# Dependencies
5+
nstools/
6+
7+
# Binaries
58
*.o
69
*.so
710
*.pyc
11+
*.exe
12+
*.dll
13+
*.dylib
14+
15+
# Generated files
16+
## API
817
src/api_*.cpp
18+
src/api_*
19+
20+
## Plateform specific code
921
include/nsimd/arm
1022
include/nsimd/cpu
1123
include/nsimd/cxx_adv_api_functions.hpp
1224
include/nsimd/friendly_but_not_optimized.hpp
1325
include/nsimd/functions.h
1426
include/nsimd/ppc
1527
include/nsimd/x86
16-
src/api_*
28+
29+
## Tests
1730
tests/c_base
1831
tests/cxx_base
1932
tests/cxx_adv
33+
tests/modules/tet1d/
34+
tests/modules/fixed_point/
35+
tests/modules/rand/*.cpp
36+
tests/modules/spmd/
37+
tests/modules/random/
38+
39+
## Benches
2040
benches/cxx_adv
21-
_deps
22-
_install
23-
doc/html
2441

42+
## Modules
43+
include/nsimd/modules/tet1d/
44+
include/nsimd/modules/spmd/
45+
include/nsimd/modules/fixed_point/
46+
include/nsimd/scalar_utilities.h
47+
48+
## Doc
49+
doc/html
2550
doc/markdown/overview.md
2651
doc/markdown/api.md
2752
doc/markdown/api_*.md
2853
doc/markdown/module_fixed_point_api*.md
2954
doc/markdown/module_fixed_point_overview.md
55+
doc/markdown/module_spmd_api*.md
56+
doc/markdown/module_spmd_overview.md
57+
doc/markdown/module_memory_management_overview.md
3058
doc/md2html
3159
doc/tmp.html
3260

3361
## Ulps
3462
ulps/
63+
64+
## CI
65+
_ci/

0 commit comments

Comments
 (0)
Please sign in to comment.