Skip to content

Commit 0d0f6b1

Browse files
committed
Auto merge of rust-lang#70793 - the8472:in-place-iter-collect, r=Amanieu
specialize some collection and iterator operations to run in-place This is a rebase and update of rust-lang#66383 which was closed due inactivity. Recent rustc changes made the compile time regressions disappear, at least for webrender-wrench. Running a stage2 compile and the rustc-perf suite takes hours on the hardware I have at the moment, so I can't do much more than that. ![Screenshot_2020-04-05 rustc performance data](https://user-images.githubusercontent.com/1065730/78462657-5d60f100-76d4-11ea-8a0b-4f3962707c38.png) In the best case of the `vec::bench_in_place_recycle` synthetic microbenchmark these optimizations can provide a 15x speedup over the regular implementation which allocates a new vec for every benchmark iteration. [Benchmark results](https://gist.github.com/the8472/6d999b2d08a2bedf3b93f12112f96e2f). In real code the speedups are tiny, but it also depends on the allocator used, a system allocator that uses a process-wide mutex will benefit more than one with thread-local pools. ## What was changed * `SpecExtend` which covered `from_iter` and `extend` specializations was split into separate traits * `extend` and `from_iter` now reuse the `append_elements` if passed iterators are from slices. * A preexisting `vec.into_iter().collect::<Vec<_>>()` optimization that passed through the original vec has been generalized further to also cover cases where the original has been partially drained. * A chain of *Vec<T> / BinaryHeap<T> / Box<[T]>* `IntoIter`s through various iterator adapters collected into *Vec<U>* and *BinaryHeap<U>* will be performed in place as long as `T` and `U` have the same alignment and size and aren't ZSTs. * To enable above specialization the unsafe, unstable `SourceIter` and `InPlaceIterable` traits have been added. The first allows reaching through the iterator pipeline to grab a pointer to the source memory. The latter is a marker that promises that the read pointer will advance as fast or faster than the write pointer and thus in-place operation is possible in the first place. * `vec::IntoIter` implements `TrustedRandomAccess` for `T: Copy` to allow in-place collection when there is a `Zip` adapter in the iterator. TRA had to be made an unstable public trait to support this. ## In-place collectible adapters * `Map` * `MapWhile` * `Filter` * `FilterMap` * `Fuse` * `Skip` * `SkipWhile` * `Take` * `TakeWhile` * `Enumerate` * `Zip` (left hand side only, `Copy` types only) * `Peek` * `Scan` * `Inspect` ## Concerns `vec.into_iter().filter(|_| false).collect()` will no longer return a vec with 0 capacity, instead it will return its original allocation. This avoids the cost of doing any allocation or deallocation but could lead to large allocations living longer than expected. If that's not acceptable some resizing policy at the end of the attempted in-place collect would be necessary, which in the worst case could result in one more memcopy than the non-specialized case. ## Possible followup work * split liballoc/vec.rs to remove `ignore-tidy-filelength` * try to get trivial chains such as `vec.into_iter().skip(1).collect::<Vec<)>>()` to compile to a `memmove` (currently compiles to a pile of SIMD, see rust-lang#69187 ) * improve up the traits so they can be reused by other crates, e.g. itertools. I think currently they're only good enough for internal use * allow iterators sourced from a `HashSet` to be in-place collected into a `Vec`
2 parents 62dad45 + 2f23a0f commit 0d0f6b1

File tree

15 files changed

+1078
-49
lines changed

15 files changed

+1078
-49
lines changed

library/alloc/benches/vec.rs

+243-2
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
1+
use rand::prelude::*;
12
use std::iter::{repeat, FromIterator};
2-
use test::Bencher;
3+
use test::{black_box, Bencher};
34

45
#[bench]
56
fn bench_new(b: &mut Bencher) {
67
b.iter(|| {
78
let v: Vec<u32> = Vec::new();
89
assert_eq!(v.len(), 0);
910
assert_eq!(v.capacity(), 0);
11+
v
1012
})
1113
}
1214

@@ -17,6 +19,7 @@ fn do_bench_with_capacity(b: &mut Bencher, src_len: usize) {
1719
let v: Vec<u32> = Vec::with_capacity(src_len);
1820
assert_eq!(v.len(), 0);
1921
assert_eq!(v.capacity(), src_len);
22+
v
2023
})
2124
}
2225

@@ -47,6 +50,7 @@ fn do_bench_from_fn(b: &mut Bencher, src_len: usize) {
4750
let dst = (0..src_len).collect::<Vec<_>>();
4851
assert_eq!(dst.len(), src_len);
4952
assert!(dst.iter().enumerate().all(|(i, x)| i == *x));
53+
dst
5054
})
5155
}
5256

@@ -77,6 +81,7 @@ fn do_bench_from_elem(b: &mut Bencher, src_len: usize) {
7781
let dst: Vec<usize> = repeat(5).take(src_len).collect();
7882
assert_eq!(dst.len(), src_len);
7983
assert!(dst.iter().all(|x| *x == 5));
84+
dst
8085
})
8186
}
8287

@@ -109,6 +114,7 @@ fn do_bench_from_slice(b: &mut Bencher, src_len: usize) {
109114
let dst = src.clone()[..].to_vec();
110115
assert_eq!(dst.len(), src_len);
111116
assert!(dst.iter().enumerate().all(|(i, x)| i == *x));
117+
dst
112118
});
113119
}
114120

@@ -141,6 +147,7 @@ fn do_bench_from_iter(b: &mut Bencher, src_len: usize) {
141147
let dst: Vec<_> = FromIterator::from_iter(src.clone());
142148
assert_eq!(dst.len(), src_len);
143149
assert!(dst.iter().enumerate().all(|(i, x)| i == *x));
150+
dst
144151
});
145152
}
146153

@@ -175,6 +182,7 @@ fn do_bench_extend(b: &mut Bencher, dst_len: usize, src_len: usize) {
175182
dst.extend(src.clone());
176183
assert_eq!(dst.len(), dst_len + src_len);
177184
assert!(dst.iter().enumerate().all(|(i, x)| i == *x));
185+
dst
178186
});
179187
}
180188

@@ -224,9 +232,24 @@ fn do_bench_extend_from_slice(b: &mut Bencher, dst_len: usize, src_len: usize) {
224232
dst.extend_from_slice(&src);
225233
assert_eq!(dst.len(), dst_len + src_len);
226234
assert!(dst.iter().enumerate().all(|(i, x)| i == *x));
235+
dst
227236
});
228237
}
229238

239+
#[bench]
240+
fn bench_extend_recycle(b: &mut Bencher) {
241+
let mut data = vec![0; 1000];
242+
243+
b.iter(|| {
244+
let tmp = std::mem::replace(&mut data, Vec::new());
245+
let mut to_extend = black_box(Vec::new());
246+
to_extend.extend(tmp.into_iter());
247+
data = black_box(to_extend);
248+
});
249+
250+
black_box(data);
251+
}
252+
230253
#[bench]
231254
fn bench_extend_from_slice_0000_0000(b: &mut Bencher) {
232255
do_bench_extend_from_slice(b, 0, 0)
@@ -271,6 +294,7 @@ fn do_bench_clone(b: &mut Bencher, src_len: usize) {
271294
let dst = src.clone();
272295
assert_eq!(dst.len(), src_len);
273296
assert!(dst.iter().enumerate().all(|(i, x)| i == *x));
297+
dst
274298
});
275299
}
276300

@@ -305,10 +329,10 @@ fn do_bench_clone_from(b: &mut Bencher, times: usize, dst_len: usize, src_len: u
305329

306330
for _ in 0..times {
307331
dst.clone_from(&src);
308-
309332
assert_eq!(dst.len(), src_len);
310333
assert!(dst.iter().enumerate().all(|(i, x)| dst_len + i == *x));
311334
}
335+
dst
312336
});
313337
}
314338

@@ -431,3 +455,220 @@ fn bench_clone_from_10_0100_0010(b: &mut Bencher) {
431455
fn bench_clone_from_10_1000_0100(b: &mut Bencher) {
432456
do_bench_clone_from(b, 10, 1000, 100)
433457
}
458+
459+
macro_rules! bench_in_place {
460+
(
461+
$($fname:ident, $type:ty , $count:expr, $init: expr);*
462+
) => {
463+
$(
464+
#[bench]
465+
fn $fname(b: &mut Bencher) {
466+
b.iter(|| {
467+
let src: Vec<$type> = black_box(vec![$init; $count]);
468+
let mut sink = src.into_iter()
469+
.enumerate()
470+
.map(|(idx, e)| { (idx as $type) ^ e }).collect::<Vec<$type>>();
471+
black_box(sink.as_mut_ptr())
472+
});
473+
}
474+
)+
475+
};
476+
}
477+
478+
bench_in_place![
479+
bench_in_place_xxu8_i0_0010, u8, 10, 0;
480+
bench_in_place_xxu8_i0_0100, u8, 100, 0;
481+
bench_in_place_xxu8_i0_1000, u8, 1000, 0;
482+
bench_in_place_xxu8_i1_0010, u8, 10, 1;
483+
bench_in_place_xxu8_i1_0100, u8, 100, 1;
484+
bench_in_place_xxu8_i1_1000, u8, 1000, 1;
485+
bench_in_place_xu32_i0_0010, u32, 10, 0;
486+
bench_in_place_xu32_i0_0100, u32, 100, 0;
487+
bench_in_place_xu32_i0_1000, u32, 1000, 0;
488+
bench_in_place_xu32_i1_0010, u32, 10, 1;
489+
bench_in_place_xu32_i1_0100, u32, 100, 1;
490+
bench_in_place_xu32_i1_1000, u32, 1000, 1;
491+
bench_in_place_u128_i0_0010, u128, 10, 0;
492+
bench_in_place_u128_i0_0100, u128, 100, 0;
493+
bench_in_place_u128_i0_1000, u128, 1000, 0;
494+
bench_in_place_u128_i1_0010, u128, 10, 1;
495+
bench_in_place_u128_i1_0100, u128, 100, 1;
496+
bench_in_place_u128_i1_1000, u128, 1000, 1
497+
];
498+
499+
#[bench]
500+
fn bench_in_place_recycle(b: &mut test::Bencher) {
501+
let mut data = vec![0; 1000];
502+
503+
b.iter(|| {
504+
let tmp = std::mem::replace(&mut data, Vec::new());
505+
data = black_box(
506+
tmp.into_iter()
507+
.enumerate()
508+
.map(|(idx, e)| idx.wrapping_add(e))
509+
.fuse()
510+
.peekable()
511+
.collect::<Vec<usize>>(),
512+
);
513+
});
514+
}
515+
516+
#[bench]
517+
fn bench_in_place_zip_recycle(b: &mut test::Bencher) {
518+
let mut data = vec![0u8; 1000];
519+
let mut rng = rand::thread_rng();
520+
let mut subst = vec![0u8; 1000];
521+
rng.fill_bytes(&mut subst[..]);
522+
523+
b.iter(|| {
524+
let tmp = std::mem::replace(&mut data, Vec::new());
525+
let mangled = tmp
526+
.into_iter()
527+
.zip(subst.iter().copied())
528+
.enumerate()
529+
.map(|(i, (d, s))| d.wrapping_add(i as u8) ^ s)
530+
.collect::<Vec<_>>();
531+
assert_eq!(mangled.len(), 1000);
532+
data = black_box(mangled);
533+
});
534+
}
535+
536+
#[bench]
537+
fn bench_in_place_zip_iter_mut(b: &mut test::Bencher) {
538+
let mut data = vec![0u8; 256];
539+
let mut rng = rand::thread_rng();
540+
let mut subst = vec![0u8; 1000];
541+
rng.fill_bytes(&mut subst[..]);
542+
543+
b.iter(|| {
544+
data.iter_mut().enumerate().for_each(|(i, d)| {
545+
*d = d.wrapping_add(i as u8) ^ subst[i];
546+
});
547+
});
548+
549+
black_box(data);
550+
}
551+
552+
#[derive(Clone)]
553+
struct Droppable(usize);
554+
555+
impl Drop for Droppable {
556+
fn drop(&mut self) {
557+
black_box(self);
558+
}
559+
}
560+
561+
#[bench]
562+
fn bench_in_place_collect_droppable(b: &mut test::Bencher) {
563+
let v: Vec<Droppable> = std::iter::repeat_with(|| Droppable(0)).take(1000).collect();
564+
b.iter(|| {
565+
v.clone()
566+
.into_iter()
567+
.skip(100)
568+
.enumerate()
569+
.map(|(i, e)| Droppable(i ^ e.0))
570+
.collect::<Vec<_>>()
571+
})
572+
}
573+
574+
#[bench]
575+
fn bench_chain_collect(b: &mut test::Bencher) {
576+
let data = black_box([0; LEN]);
577+
b.iter(|| data.iter().cloned().chain([1].iter().cloned()).collect::<Vec<_>>());
578+
}
579+
580+
#[bench]
581+
fn bench_chain_chain_collect(b: &mut test::Bencher) {
582+
let data = black_box([0; LEN]);
583+
b.iter(|| {
584+
data.iter()
585+
.cloned()
586+
.chain([1].iter().cloned())
587+
.chain([2].iter().cloned())
588+
.collect::<Vec<_>>()
589+
});
590+
}
591+
592+
#[bench]
593+
fn bench_nest_chain_chain_collect(b: &mut test::Bencher) {
594+
let data = black_box([0; LEN]);
595+
b.iter(|| {
596+
data.iter().cloned().chain([1].iter().chain([2].iter()).cloned()).collect::<Vec<_>>()
597+
});
598+
}
599+
600+
pub fn example_plain_slow(l: &[u32]) -> Vec<u32> {
601+
let mut result = Vec::with_capacity(l.len());
602+
result.extend(l.iter().rev());
603+
result
604+
}
605+
606+
pub fn map_fast(l: &[(u32, u32)]) -> Vec<u32> {
607+
let mut result = Vec::with_capacity(l.len());
608+
for i in 0..l.len() {
609+
unsafe {
610+
*result.get_unchecked_mut(i) = l[i].0;
611+
result.set_len(i);
612+
}
613+
}
614+
result
615+
}
616+
617+
const LEN: usize = 16384;
618+
619+
#[bench]
620+
fn bench_range_map_collect(b: &mut test::Bencher) {
621+
b.iter(|| (0..LEN).map(|_| u32::default()).collect::<Vec<_>>());
622+
}
623+
624+
#[bench]
625+
fn bench_chain_extend_ref(b: &mut test::Bencher) {
626+
let data = black_box([0; LEN]);
627+
b.iter(|| {
628+
let mut v = Vec::<u32>::with_capacity(data.len() + 1);
629+
v.extend(data.iter().chain([1].iter()));
630+
v
631+
});
632+
}
633+
634+
#[bench]
635+
fn bench_chain_extend_value(b: &mut test::Bencher) {
636+
let data = black_box([0; LEN]);
637+
b.iter(|| {
638+
let mut v = Vec::<u32>::with_capacity(data.len() + 1);
639+
v.extend(data.iter().cloned().chain(Some(1)));
640+
v
641+
});
642+
}
643+
644+
#[bench]
645+
fn bench_rev_1(b: &mut test::Bencher) {
646+
let data = black_box([0; LEN]);
647+
b.iter(|| {
648+
let mut v = Vec::<u32>::new();
649+
v.extend(data.iter().rev());
650+
v
651+
});
652+
}
653+
654+
#[bench]
655+
fn bench_rev_2(b: &mut test::Bencher) {
656+
let data = black_box([0; LEN]);
657+
b.iter(|| example_plain_slow(&data));
658+
}
659+
660+
#[bench]
661+
fn bench_map_regular(b: &mut test::Bencher) {
662+
let data = black_box([(0, 0); LEN]);
663+
b.iter(|| {
664+
let mut v = Vec::<u32>::new();
665+
v.extend(data.iter().map(|t| t.1));
666+
v
667+
});
668+
}
669+
670+
#[bench]
671+
fn bench_map_fast(b: &mut test::Bencher) {
672+
let data = black_box([(0, 0); LEN]);
673+
b.iter(|| map_fast(&data));
674+
}

library/alloc/src/collections/binary_heap.rs

+23-2
Original file line numberDiff line numberDiff line change
@@ -145,13 +145,13 @@
145145
#![stable(feature = "rust1", since = "1.0.0")]
146146

147147
use core::fmt;
148-
use core::iter::{FromIterator, FusedIterator, TrustedLen};
148+
use core::iter::{FromIterator, FusedIterator, InPlaceIterable, SourceIter, TrustedLen};
149149
use core::mem::{self, size_of, swap, ManuallyDrop};
150150
use core::ops::{Deref, DerefMut};
151151
use core::ptr;
152152

153153
use crate::slice;
154-
use crate::vec::{self, Vec};
154+
use crate::vec::{self, AsIntoIter, Vec};
155155

156156
use super::SpecExtend;
157157

@@ -1173,6 +1173,27 @@ impl<T> ExactSizeIterator for IntoIter<T> {
11731173
#[stable(feature = "fused", since = "1.26.0")]
11741174
impl<T> FusedIterator for IntoIter<T> {}
11751175

1176+
#[unstable(issue = "none", feature = "inplace_iteration")]
1177+
unsafe impl<T> SourceIter for IntoIter<T> {
1178+
type Source = IntoIter<T>;
1179+
1180+
#[inline]
1181+
unsafe fn as_inner(&mut self) -> &mut Self::Source {
1182+
self
1183+
}
1184+
}
1185+
1186+
#[unstable(issue = "none", feature = "inplace_iteration")]
1187+
unsafe impl<I> InPlaceIterable for IntoIter<I> {}
1188+
1189+
impl<I> AsIntoIter for IntoIter<I> {
1190+
type Item = I;
1191+
1192+
fn as_into_iter(&mut self) -> &mut vec::IntoIter<Self::Item> {
1193+
&mut self.iter
1194+
}
1195+
}
1196+
11761197
#[unstable(feature = "binary_heap_into_iter_sorted", issue = "59278")]
11771198
#[derive(Clone, Debug)]
11781199
pub struct IntoIterSorted<T> {

library/alloc/src/lib.rs

+4
Original file line numberDiff line numberDiff line change
@@ -99,13 +99,15 @@
9999
#![feature(fmt_internals)]
100100
#![feature(fn_traits)]
101101
#![feature(fundamental)]
102+
#![feature(inplace_iteration)]
102103
#![feature(internal_uninit_const)]
103104
#![feature(lang_items)]
104105
#![feature(layout_for_ptr)]
105106
#![feature(libc)]
106107
#![feature(map_first_last)]
107108
#![feature(map_into_keys_values)]
108109
#![feature(negative_impls)]
110+
#![feature(never_type)]
109111
#![feature(new_uninit)]
110112
#![feature(nll)]
111113
#![feature(nonnull_slice_from_raw_parts)]
@@ -133,7 +135,9 @@
133135
#![feature(slice_partition_dedup)]
134136
#![feature(maybe_uninit_extra, maybe_uninit_slice)]
135137
#![feature(alloc_layout_extra)]
138+
#![feature(trusted_random_access)]
136139
#![feature(try_trait)]
140+
#![feature(type_alias_impl_trait)]
137141
#![feature(associated_type_bounds)]
138142
// Allow testing this library
139143

0 commit comments

Comments
 (0)