Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync portable-simd to 2023 May 10 #111429

Closed
Changes from 89 commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
b5f9d43
rust-lang/portable-simd#289: Strengthen warnings about relying on Mas…
thomcc Jul 21, 2022
ddede9f
make some Miri backtraces more pretty
RalfJung Jul 22, 2022
3183afb
Fix interleave/deinterleave for vectors with only one lane
calebzulawski Jul 29, 2022
691c8b2
Merge pull request #295 from RalfJung/miri-backtraces
calebzulawski Jul 29, 2022
8742a86
add all_lane_counts feature to enable non-power-of-2 lane counts <= 64
programmerjake Jul 29, 2022
6bf5128
Simplify interleave/deinterleave and fix for odd-length vectors.
calebzulawski Aug 1, 2022
c739af3
Hide rustc unstable feature from docs
calebzulawski Aug 1, 2022
d030301
Remove special case for length-1 vectors
calebzulawski Aug 1, 2022
5f70664
Simplify expression
calebzulawski Aug 1, 2022
7145dc5
Merge pull request #299 from rust-lang/interleave-one
calebzulawski Aug 4, 2022
d5cd4a8
Merge pull request #300 from programmerjake/all_lane_counts
programmerjake Aug 15, 2022
2c5ebfb
add feature flag
miguelraz Oct 1, 2022
4491309
Mark more mask functions inline
calebzulawski Oct 16, 2022
ee9a23f
Update readme
calebzulawski Oct 16, 2022
f236f57
Update README.md
calebzulawski Oct 16, 2022
2f38f70
Merge pull request #309 from rust-lang/mask-inline
calebzulawski Oct 16, 2022
aad8f0a
Merge pull request #310 from rust-lang/readme
calebzulawski Oct 17, 2022
61a6f18
Specify aliases in one place, and make it more uniform which are defined
calebzulawski Oct 17, 2022
402b50a
Improve variable names
calebzulawski Oct 17, 2022
7c80b69
Merge pull request #311 from rust-lang/alias
calebzulawski Oct 29, 2022
d3cfd7c
Add vectors of pointers
calebzulawski Jun 22, 2022
7e96f5d
Use safe casts
calebzulawski Jun 22, 2022
4076ba8
Implement scatter/gather with new pointer vector and add tests
calebzulawski Jun 23, 2022
6b3c599
Add missing safety comment
calebzulawski Jun 23, 2022
f10e591
Fix wrapping pointer arithmetic
calebzulawski Jun 24, 2022
da25087
Test a more useful pointer
calebzulawski Jun 24, 2022
e7cc021
Fix casts
calebzulawski Jun 25, 2022
8a5a573
Clarify addr and with_addr implementations
calebzulawski Jun 26, 2022
176cc81
Update for new intrinsics
calebzulawski Aug 4, 2022
dadf98a
Remove duplicate intrinsic
calebzulawski Aug 4, 2022
e5db1ec
Fix documentation
calebzulawski Aug 4, 2022
0fcc406
Fix pointer mutability casts and safety lints
calebzulawski Aug 5, 2022
a79718f
Use new intrinsics
calebzulawski Sep 18, 2022
078cb58
Apply suggestions from code review
calebzulawski Sep 19, 2022
469c620
Account for pointer metadata in pointer bounds
calebzulawski Oct 22, 2022
87779ae
Merge pull request #287 from rust-lang/feature/pointer-vectors
calebzulawski Oct 29, 2022
de30820
Update README.md
calebzulawski Oct 30, 2022
572122a
Add missing pointer tests and rename pointer cast fns to match scalars
calebzulawski Nov 10, 2022
ecc2875
Merge pull request #313 from rust-lang/pointer-tests
calebzulawski Nov 10, 2022
7ac1fbb
impl TryFrom<&[T]> for Simd
calebzulawski Nov 11, 2022
9dc690c
Add TryFrom<&[T]> tests
calebzulawski Nov 11, 2022
fd53445
Add pointer scatter/gather
calebzulawski Nov 12, 2022
35c60ce
Merge pull request #314 from rust-lang/try-from-slice
calebzulawski Nov 12, 2022
bef4c41
Add test examples
calebzulawski Nov 12, 2022
c247915
Update crates/core_simd/src/vector.rs
calebzulawski Nov 13, 2022
7e614f0
Fix typo typo
calebzulawski Nov 13, 2022
6e30c6e
Merge pull request #315 from rust-lang/scatter-gather-ptr
calebzulawski Nov 20, 2022
db8b23c
Remove reexport of simd::*
calebzulawski Nov 28, 2022
645ab61
Merge pull request #317 from rust-lang/fix-exports
calebzulawski Nov 28, 2022
54b6f69
Avoid a scalar loop in `Simd::from_slice`
thomcc Nov 28, 2022
1547dd6
Merge pull request #318 from thomcc/simd_from_slice
calebzulawski Nov 28, 2022
df3a639
add dot_product example
miguelraz Jun 4, 2021
c08a4d1
add more basic dot products and comments, README
miguelraz Mar 26, 2022
4615805
add remainder dot_product and cleanup
miguelraz Mar 26, 2022
4ddfd2f
non allocating fold simd
miguelraz Mar 29, 2022
aeac9ed
proper mul_add arg order, added tests
miguelraz Mar 29, 2022
64247a3
add _scalar names for dot_product examples
miguelraz Mar 30, 2022
da3bd6d
Update dot_product example import
workingjubilee Dec 4, 2022
582239a
Merge pull request #128 from miguelraz/dotprodexample
workingjubilee Dec 4, 2022
e3ef226
Fix the typo
howjmay Jan 23, 2023
9bd30e7
Merge pull request #327 from howjmay/typo
calebzulawski Jan 23, 2023
0fd7c8e
Add copy_to_slice
calebzulawski Feb 19, 2023
36829dd
Check that vectors aren't padded
calebzulawski Feb 19, 2023
8dcb4d5
Merge pull request #331 from rust-lang/to_slice
calebzulawski Mar 11, 2023
65b5210
Skip building wasm-bindgen-test on non-wasm targets
bjorn3 Mar 26, 2023
e6bbf49
Merge pull request #336 from bjorn3/faster_tests
calebzulawski Mar 26, 2023
90f2af7
Fix lint
calebzulawski Mar 26, 2023
ceb2611
Remove formats `[T; N]` does not impl (rust-lang/portable-simd#337)
workingjubilee Apr 10, 2023
afad9c3
Don't use direct field access in `Simd` functions
Sp00ph Apr 22, 2023
52833cc
Add notes to avoid direct field accesses
Sp00ph Apr 22, 2023
f1b86ba
Use pointer reads for better codegen in debug mode
Sp00ph Apr 22, 2023
f916add
Don't use direct field access in `Simd` functions (rust-lang/portable…
workingjubilee Apr 23, 2023
71d4c36
lane -> element for core::simd::Simd
workingjubilee Mar 18, 2023
92259a4
Clarify elementwise cmp reduces
workingjubilee Apr 10, 2023
4064678
Explain why to use Simd early
workingjubilee Apr 11, 2023
2b32732
Do not construct Simd
workingjubilee Apr 23, 2023
4f0d822
Implement dynamic byte-swizzle prototype (rust-lang/portable-simd#334)
workingjubilee Apr 23, 2023
ad8afa8
lane -> element for `core::simd::Simd` (rust-lang/portable-simd#338)
workingjubilee Apr 23, 2023
394a884
Fix {to,from}_array UB when repr(simd) produces padding
calebzulawski Apr 23, 2023
c504f01
Use cast and improve comments
calebzulawski Apr 26, 2023
195d4ca
Merge pull request #342 from rust-lang/load-store
calebzulawski Apr 27, 2023
4967f25
Use the new `load`/`store` functions in `{from,to}_slice`
Sp00ph May 7, 2023
b246e45
Fix inaccurate safety comments
Sp00ph May 7, 2023
45413e4
Merge pull request #346 from Sp00ph/update_safety
calebzulawski May 7, 2023
50416fc
Merge pull request #345 from Sp00ph/from_to_slice
calebzulawski May 7, 2023
8f50a17
Fixups for sync
workingjubilee Apr 23, 2023
d361e43
Drop const_ptr_read feature gate
workingjubilee May 10, 2023
8527625
Temp fix for swizzle_dyn
workingjubilee May 10, 2023
d8448c7
Sync portable-simd to 2023 May 10
workingjubilee May 10, 2023
fb42fac
Bless tests for portable-simd sync
workingjubilee May 11, 2023
dac348e
miri: Move patterns for simd tests
workingjubilee May 11, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions library/portable-simd/.github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -241,6 +241,10 @@ jobs:
- "--features std"
- "--features generic_const_exprs"
- "--features std --features generic_const_exprs"
- "--features all_lane_counts"
- "--features all_lane_counts --features std"
- "--features all_lane_counts --features generic_const_exprs"
- "--features all_lane_counts --features std --features generic_const_exprs"

steps:
- uses: actions/checkout@v2
34 changes: 12 additions & 22 deletions library/portable-simd/README.md
Original file line number Diff line number Diff line change
@@ -24,44 +24,34 @@ or by setting up `rustup default nightly` or else with `cargo +nightly {build,te
```bash
cargo new hellosimd
```
to create a new crate. Edit `hellosimd/Cargo.toml` to be
```toml
[package]
name = "hellosimd"
version = "0.1.0"
edition = "2018"
[dependencies]
core_simd = { git = "https://github.com/rust-lang/portable-simd" }
```

and finally write this in `src/main.rs`:
to create a new crate. Finally write this in `src/main.rs`:
```rust
use core_simd::*;
#![feature(portable_simd)]
use std::simd::f32x4;
fn main() {
let a = f32x4::splat(10.0);
let b = f32x4::from_array([1.0, 2.0, 3.0, 4.0]);
println!("{:?}", a + b);
}
```

Explanation: We import all the bindings from the crate with the first line. Then, we construct our SIMD vectors with methods like `splat` or `from_array`. Finally, we can use operators on them like `+` and the appropriate SIMD instructions will be carried out. When we run `cargo run` you should get `[11.0, 12.0, 13.0, 14.0]`.

## Code Organization
Explanation: We construct our SIMD vectors with methods like `splat` or `from_array`. Next, we can use operators like `+` on them, and the appropriate SIMD instructions will be carried out. When we run `cargo run` you should get `[11.0, 12.0, 13.0, 14.0]`.

Currently the crate is organized so that each element type is a file, and then the 64-bit, 128-bit, 256-bit, and 512-bit vectors using those types are contained in said file.
## Supported vectors

All types are then exported as a single, flat module.
Currently, vectors may have up to 64 elements, but aliases are provided only up to 512-bit vectors.

Depending on the size of the primitive type, the number of lanes the vector will have varies. For example, 128-bit vectors have four `f32` lanes and two `f64` lanes.

The supported element types are as follows:
* **Floating Point:** `f32`, `f64`
* **Signed Integers:** `i8`, `i16`, `i32`, `i64`, `i128`, `isize`
* **Unsigned Integers:** `u8`, `u16`, `u32`, `u64`, `u128`, `usize`
* **Masks:** `mask8`, `mask16`, `mask32`, `mask64`, `mask128`, `masksize`
* **Signed Integers:** `i8`, `i16`, `i32`, `i64`, `isize` (`i128` excluded)
* **Unsigned Integers:** `u8`, `u16`, `u32`, `u64`, `usize` (`u128` excluded)
* **Pointers:** `*const T` and `*mut T` (zero-sized metadata only)
* **Masks:** 8-bit, 16-bit, 32-bit, 64-bit, and `usize`-sized masks

Floating point, signed integers, and unsigned integers are the [primitive types](https://doc.rust-lang.org/core/primitive/index.html) you're already used to.
The `mask` types are "truthy" values, but they use the number of bits in their name instead of just 1 bit like a normal `bool` uses.
Floating point, signed integers, unsigned integers, and pointers are the [primitive types](https://doc.rust-lang.org/core/primitive/index.html) you're already used to.
The mask types have elements that are "truthy" values, like `bool`, but have an unspecified layout because different architectures prefer different layouts for mask types.

[simd-guide]: ./beginners-guide.md
[zulip-project-portable-simd]: https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd
9 changes: 4 additions & 5 deletions library/portable-simd/crates/core_simd/Cargo.toml
Original file line number Diff line number Diff line change
@@ -13,12 +13,11 @@ default = ["as_crate"]
as_crate = []
std = []
generic_const_exprs = []
all_lane_counts = []

[target.'cfg(target_arch = "wasm32")'.dev-dependencies.wasm-bindgen]
version = "0.2"

[dev-dependencies.wasm-bindgen-test]
version = "0.3"
[target.'cfg(target_arch = "wasm32")'.dev-dependencies]
wasm-bindgen = "0.2"
wasm-bindgen-test = "0.3"

[dev-dependencies.proptest]
version = "0.10"
13 changes: 13 additions & 0 deletions library/portable-simd/crates/core_simd/examples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
### `stdsimd` examples

This crate is a port of example uses of `stdsimd`, mostly taken from the `packed_simd` crate.

The examples contain, as in the case of `dot_product.rs`, multiple ways of solving the problem, in order to show idiomatic uses of SIMD and iteration of performance designs.

Run the tests with the command

```
cargo run --example dot_product
```

and verify the code for `dot_product.rs` on your machine.
169 changes: 169 additions & 0 deletions library/portable-simd/crates/core_simd/examples/dot_product.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
// Code taken from the `packed_simd` crate
// Run this code with `cargo test --example dot_product`
//use std::iter::zip;

#![feature(array_chunks)]
#![feature(slice_as_chunks)]
// Add these imports to use the stdsimd library
#![feature(portable_simd)]
use core_simd::simd::*;

// This is your barebones dot product implementation:
// Take 2 vectors, multiply them element wise and *then*
// go along the resulting array and add up the result.
// In the next example we will see if there
// is any difference to adding and multiplying in tandem.
pub fn dot_prod_scalar_0(a: &[f32], b: &[f32]) -> f32 {
assert_eq!(a.len(), b.len());

a.iter().zip(b.iter()).map(|(a, b)| a * b).sum()
}

// When dealing with SIMD, it is very important to think about the amount
// of data movement and when it happens. We're going over simple computation examples here, and yet
// it is not trivial to understand what may or may not contribute to performance
// changes. Eventually, you will need tools to inspect the generated assembly and confirm your
// hypothesis and benchmarks - we will mention them later on.
// With the use of `fold`, we're doing a multiplication,
// and then adding it to the sum, one element from both vectors at a time.
pub fn dot_prod_scalar_1(a: &[f32], b: &[f32]) -> f32 {
assert_eq!(a.len(), b.len());
a.iter()
.zip(b.iter())
.fold(0.0, |a, zipped| a + zipped.0 * zipped.1)
}

// We now move on to the SIMD implementations: notice the following constructs:
// `array_chunks::<4>`: mapping this over the vector will let use construct SIMD vectors
// `f32x4::from_array`: construct the SIMD vector from a slice
// `(a * b).reduce_sum()`: Multiply both f32x4 vectors together, and then reduce them.
// This approach essentially uses SIMD to produce a vector of length N/4 of all the products,
// and then add those with `sum()`. This is suboptimal.
// TODO: ASCII diagrams
pub fn dot_prod_simd_0(a: &[f32], b: &[f32]) -> f32 {
assert_eq!(a.len(), b.len());
// TODO handle remainder when a.len() % 4 != 0
a.array_chunks::<4>()
.map(|&a| f32x4::from_array(a))
.zip(b.array_chunks::<4>().map(|&b| f32x4::from_array(b)))
.map(|(a, b)| (a * b).reduce_sum())
.sum()
}

// There's some simple ways to improve the previous code:
// 1. Make a `zero` `f32x4` SIMD vector that we will be accumulating into
// So that there is only one `sum()` reduction when the last `f32x4` has been processed
// 2. Exploit Fused Multiply Add so that the multiplication, addition and sinking into the reduciton
// happen in the same step.
// If the arrays are large, minimizing the data shuffling will lead to great perf.
// If the arrays are small, handling the remainder elements when the length isn't a multiple of 4
// Can become a problem.
pub fn dot_prod_simd_1(a: &[f32], b: &[f32]) -> f32 {
assert_eq!(a.len(), b.len());
// TODO handle remainder when a.len() % 4 != 0
a.array_chunks::<4>()
.map(|&a| f32x4::from_array(a))
.zip(b.array_chunks::<4>().map(|&b| f32x4::from_array(b)))
.fold(f32x4::splat(0.0), |acc, zipped| acc + zipped.0 * zipped.1)
.reduce_sum()
}

// A lot of knowledgeable use of SIMD comes from knowing specific instructions that are
// available - let's try to use the `mul_add` instruction, which is the fused-multiply-add we were looking for.
use std_float::StdFloat;
pub fn dot_prod_simd_2(a: &[f32], b: &[f32]) -> f32 {
assert_eq!(a.len(), b.len());
// TODO handle remainder when a.len() % 4 != 0
let mut res = f32x4::splat(0.0);
a.array_chunks::<4>()
.map(|&a| f32x4::from_array(a))
.zip(b.array_chunks::<4>().map(|&b| f32x4::from_array(b)))
.for_each(|(a, b)| {
res = a.mul_add(b, res);
});
res.reduce_sum()
}

// Finally, we will write the same operation but handling the loop remainder.
const LANES: usize = 4;
pub fn dot_prod_simd_3(a: &[f32], b: &[f32]) -> f32 {
assert_eq!(a.len(), b.len());

let (a_extra, a_chunks) = a.as_rchunks();
let (b_extra, b_chunks) = b.as_rchunks();

// These are always true, but for emphasis:
assert_eq!(a_chunks.len(), b_chunks.len());
assert_eq!(a_extra.len(), b_extra.len());

let mut sums = [0.0; LANES];
for ((x, y), d) in std::iter::zip(a_extra, b_extra).zip(&mut sums) {
*d = x * y;
}

let mut sums = f32x4::from_array(sums);
std::iter::zip(a_chunks, b_chunks).for_each(|(x, y)| {
sums += f32x4::from_array(*x) * f32x4::from_array(*y);
});

sums.reduce_sum()
}

// Finally, we present an iterator version for handling remainders in a scalar fashion at the end of the loop.
// Unfortunately, this is allocating 1 `XMM` register on the order of `~len(a)` - we'll see how we can get around it in the
// next example.
pub fn dot_prod_simd_4(a: &[f32], b: &[f32]) -> f32 {
let mut sum = a
.array_chunks::<4>()
.map(|&a| f32x4::from_array(a))
.zip(b.array_chunks::<4>().map(|&b| f32x4::from_array(b)))
.map(|(a, b)| a * b)
.fold(f32x4::splat(0.0), std::ops::Add::add)
.reduce_sum();
let remain = a.len() - (a.len() % 4);
sum += a[remain..]
.iter()
.zip(&b[remain..])
.map(|(a, b)| a * b)
.sum::<f32>();
sum
}

// This version allocates a single `XMM` register for accumulation, and the folds don't allocate on top of that.
// Notice the the use of `mul_add`, which can do a multiply and an add operation ber iteration.
pub fn dot_prod_simd_5(a: &[f32], b: &[f32]) -> f32 {
a.array_chunks::<4>()
.map(|&a| f32x4::from_array(a))
.zip(b.array_chunks::<4>().map(|&b| f32x4::from_array(b)))
.fold(f32x4::splat(0.), |acc, (a, b)| a.mul_add(b, acc))
.reduce_sum()
}

fn main() {
// Empty main to make cargo happy
}

#[cfg(test)]
mod tests {
#[test]
fn smoke_test() {
use super::*;
let a: Vec<f32> = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let b: Vec<f32> = vec![-8.0, -7.0, -6.0, -5.0, 4.0, 3.0, 2.0, 1.0];
let x: Vec<f32> = [0.5; 1003].to_vec();
let y: Vec<f32> = [2.0; 1003].to_vec();

// Basic check
assert_eq!(0.0, dot_prod_scalar_0(&a, &b));
assert_eq!(0.0, dot_prod_scalar_1(&a, &b));
assert_eq!(0.0, dot_prod_simd_0(&a, &b));
assert_eq!(0.0, dot_prod_simd_1(&a, &b));
assert_eq!(0.0, dot_prod_simd_2(&a, &b));
assert_eq!(0.0, dot_prod_simd_3(&a, &b));
assert_eq!(0.0, dot_prod_simd_4(&a, &b));
assert_eq!(0.0, dot_prod_simd_5(&a, &b));

// We can handle vectors that are non-multiples of 4
assert_eq!(1003.0, dot_prod_simd_3(&x, &y));
}
}
Loading