Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc++] Optimize ranges::move{,_backward} for vector<bool>::iterator #121109

Merged
merged 3 commits into from
Feb 19, 2025

Conversation

winner245
Copy link
Contributor

@winner245 winner245 commented Dec 25, 2024

As a follow-up to #121013 (which optimized ranges::copy) and #121026 (which optimized ranges::copy_backward), this PR enhances the performance of std::ranges::{move, move_backward} for vector<bool>::iterator, addressing a subtask outlined in issue #64038.

The optimizations bring performance improvements analogous to those achieved for the {copy, copy_backward} algorithms: up to 2000x for aligned moves and 60x for unaligned moves. Moreover, comprehensive tests covering up to 4 storage words (256 bytes) with odd and even bit sizes are provided, which validate the proposed optimizations in this patch.

Benchmarks

Aligned move (up to 2111x)

---------------------------------------------------------------------------
Benchmark                                  Before        After  Improvement
---------------------------------------------------------------------------
bm_ranges_move_vb_aligned/8               11.4 ns      5.29 ns           2x
bm_ranges_move_vb_aligned/64              92.1 ns      5.73 ns          16x
bm_ranges_move_vb_aligned/512              729 ns      5.31 ns         137x
bm_ranges_move_vb_aligned/4096            5759 ns      10.4 ns        1168x
bm_ranges_move_vb_aligned/32768          45092 ns      38.6 ns        15424
bm_ranges_move_vb_aligned/65536          91119 ns      63.5 ns        1435x
bm_ranges_move_vb_aligned/102400        141736 ns       112 ns        1266x
bm_ranges_move_vb_aligned/106496        146472 ns       116 ns        1263x
bm_ranges_move_vb_aligned/110592        153312 ns       121 ns        1267x
bm_ranges_move_vb_aligned/114688        159366 ns       125 ns        1275x
bm_ranges_move_vb_aligned/118784        164790 ns       130 ns        1267x
bm_ranges_move_vb_aligned/122880        171524 ns       136 ns        1261x
bm_ranges_move_vb_aligned/126976        175769 ns       143 ns        1229x
bm_ranges_move_vb_aligned/131072        182248 ns       175 ns        1041x
bm_ranges_move_vb_aligned/135168        188863 ns       177 ns        1067x
bm_ranges_move_vb_aligned/139264        192792 ns       165 ns        1168x
bm_ranges_move_vb_aligned/143360        194506 ns       156 ns        1247x
bm_ranges_move_vb_aligned/147456        203244 ns       160 ns        1270x
bm_ranges_move_vb_aligned/151552        207067 ns      99.4 ns        2083x
bm_ranges_move_vb_aligned/155648        213459 ns       116 ns        1840x
bm_ranges_move_vb_aligned/159744        219885 ns       162 ns        1357x
bm_ranges_move_vb_aligned/163840        228873 ns       142 ns        1612x
bm_ranges_move_vb_aligned/167936        233786 ns       160 ns        1461x
bm_ranges_move_vb_aligned/172032        236231 ns       129 ns        1831x
bm_ranges_move_vb_aligned/176128        247542 ns       119 ns        2080x
bm_ranges_move_vb_aligned/180224        249135 ns       118 ns        2111x
bm_ranges_move_vb_aligned/184320        253485 ns       123 ns        2060x
bm_ranges_move_vb_aligned/188416        258189 ns       148 ns        1745x
bm_ranges_move_vb_aligned/192512        267000 ns       207 ns        1290x
bm_ranges_move_vb_aligned/196608        269285 ns       185 ns        1456x
bm_ranges_move_vb_aligned/200704        279332 ns       238 ns        1174x
bm_ranges_move_vb_aligned/204800        286305 ns       236 ns        1213x

Aligned move_backward (up to 2233x)

-----------------------------------------------------------------------------------
Benchmark                                           Before       After  Improvement
-----------------------------------------------------------------------------------
bm_ranges_move_backward_vb_aligned/8               11.3 ns     5.81 ns           2x
bm_ranges_move_backward_vb_aligned/64              82.2 ns     6.15 ns          13x
bm_ranges_move_backward_vb_aligned/512              651 ns     5.59 ns         116x
bm_ranges_move_backward_vb_aligned/4096            5234 ns     10.9 ns         480x
bm_ranges_move_backward_vb_aligned/32768          42212 ns     39.0 ns        1082x
bm_ranges_move_backward_vb_aligned/65536          84501 ns     63.0 ns        1341x
bm_ranges_move_backward_vb_aligned/102400        133306 ns     62.0 ns        2150x
bm_ranges_move_backward_vb_aligned/106496        140072 ns     63.6 ns        2202x
bm_ranges_move_backward_vb_aligned/110592        147510 ns     66.4 ns        2222x
bm_ranges_move_backward_vb_aligned/114688        147312 ns     68.0 ns        2166x
bm_ranges_move_backward_vb_aligned/118784        155653 ns     70.7 ns        2202x
bm_ranges_move_backward_vb_aligned/122880        161665 ns     72.4 ns        2233x
bm_ranges_move_backward_vb_aligned/126976        164021 ns     76.6 ns        2142x
bm_ranges_move_backward_vb_aligned/131072        171836 ns     76.9 ns        2235x
bm_ranges_move_backward_vb_aligned/135168        177469 ns      128 ns        1386x
bm_ranges_move_backward_vb_aligned/139264        180642 ns     99.6 ns        1814x
bm_ranges_move_backward_vb_aligned/143360        186952 ns     90.3 ns        2070x
bm_ranges_move_backward_vb_aligned/147456        192811 ns     90.7 ns        2126x
bm_ranges_move_backward_vb_aligned/151552        201549 ns     94.3 ns        2137x
bm_ranges_move_backward_vb_aligned/155648        205928 ns      111 ns        1855x
bm_ranges_move_backward_vb_aligned/159744        210829 ns      152 ns        1387x
bm_ranges_move_backward_vb_aligned/163840        213758 ns      137 ns        1560x
bm_ranges_move_backward_vb_aligned/167936        221136 ns      153 ns        1445x
bm_ranges_move_backward_vb_aligned/172032        225363 ns      124 ns        1817x
bm_ranges_move_backward_vb_aligned/176128        230920 ns      116 ns        1991x
bm_ranges_move_backward_vb_aligned/180224        237196 ns      117 ns        2027x
bm_ranges_move_backward_vb_aligned/184320        243265 ns      123 ns        1978x
bm_ranges_move_backward_vb_aligned/188416        244984 ns      143 ns        1713x
bm_ranges_move_backward_vb_aligned/192512        252805 ns      197 ns        1283x
bm_ranges_move_backward_vb_aligned/196608        259500 ns      185 ns        1403x
bm_ranges_move_backward_vb_aligned/200704        260624 ns      231 ns        1128x
bm_ranges_move_backward_vb_aligned/204800        268815 ns      239 ns        1125x

Unaligned move (up to 67x)

---------------------------------------------------------------------------
Benchmark                                  Before        After  Improvement
---------------------------------------------------------------------------
bm_ranges_move_vb_unaligned/8             6.23 ns      10.4 ns         0.6x
bm_ranges_move_vb_unaligned/64            89.3 ns      10.3 ns           9x
bm_ranges_move_vb_unaligned/512            741 ns      18.9 ns          39x
bm_ranges_move_vb_unaligned/4096          6059 ns      96.7 ns          63x
bm_ranges_move_vb_unaligned/32768        49585 ns       747 ns          66x
bm_ranges_move_vb_unaligned/262144      395276 ns      5895 ns          67x
bm_ranges_move_vb_unaligned/1048576    1567811 ns     23451 ns          67x

Unaligned move_backward (up to 60x)

-----------------------------------------------------------------------------------
Benchmark                                           Before       After  Improvement
-----------------------------------------------------------------------------------
bm_ranges_move_backward_vb_unaligned/8             5.63 ns     10.2 ns         0.5x
bm_ranges_move_backward_vb_unaligned/64            88.2 ns     9.43 ns           9x
bm_ranges_move_backward_vb_unaligned/512            727 ns     18.4 ns          40x
bm_ranges_move_backward_vb_unaligned/4096          5789 ns      103 ns          56x
bm_ranges_move_backward_vb_unaligned/32768        45744 ns      835 ns          55x
bm_ranges_move_backward_vb_unaligned/262144      377009 ns     6307 ns          60x
bm_ranges_move_backward_vb_unaligned/1048576    1507192 ns    25390 ns          59x

@winner245 winner245 force-pushed the optimize-ranges-move branch 3 times, most recently from 544adcf to cd7a802 Compare December 25, 2024 18:17
@winner245 winner245 force-pushed the optimize-ranges-move branch 2 times, most recently from 6d84599 to dc159bd Compare January 26, 2025 19:49
@winner245 winner245 force-pushed the optimize-ranges-move branch 3 times, most recently from 3200fa4 to 926cc41 Compare February 3, 2025 04:47
@winner245 winner245 marked this pull request as ready for review February 3, 2025 15:55
@winner245 winner245 requested a review from a team as a code owner February 3, 2025 15:55
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Feb 3, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 3, 2025

@llvm/pr-subscribers-libcxx

Author: Peng Liu (winner245)

Changes

As a follow-up to #121013 (which optimized ranges::copy) and #121026 (which optimized ranges::copy_backward), this PR enhances the performance of std::ranges::{move, move_backward} for vector&lt;bool&gt;::iterator, addressing a subtask outlined in issue #64038.

The optimizations bring performance improvements analogous to those achieved for the {copy, copy_backward} algorithms: up to 2000x for aligned moves and 60x for unaligned moves. Moreover, comprehensive tests covering up to 4 storage words (256 bytes) with odd and even bit sizes are provided, which validate the proposed optimizations in this patch.

Benchmarks

Aligned move (up to 2406x)

-----------------------------------------------------------------------------
Benchmark                                  Before         After   Improvement
-----------------------------------------------------------------------------
bm_ranges_move_vb_aligned/8               11.3 ns       4.80 ns          2.4x
bm_ranges_move_vb_aligned/64              89.5 ns       5.33 ns         16.8x
bm_ranges_move_vb_aligned/512              716 ns       5.05 ns          142x
bm_ranges_move_vb_aligned/4096            5662 ns       9.84 ns          575x
bm_ranges_move_vb_aligned/32768          44880 ns       46.1 ns          973x
bm_ranges_move_vb_aligned/65536          90616 ns       79.3 ns         1143x
bm_ranges_move_vb_aligned/102400        144683 ns       67.0 ns         2159x
bm_ranges_move_vb_aligned/106496        150643 ns       68.5 ns         2199x
bm_ranges_move_vb_aligned/110592        158098 ns       70.3 ns         2249x
bm_ranges_move_vb_aligned/114688        168557 ns       74.9 ns         2250x
bm_ranges_move_vb_aligned/118784        169404 ns       77.6 ns         2183x
bm_ranges_move_vb_aligned/122880        173890 ns       80.0 ns         2174x
bm_ranges_move_vb_aligned/126976        179472 ns       81.2 ns         2210x
bm_ranges_move_vb_aligned/131072        182714 ns       86.5 ns         2112x
bm_ranges_move_vb_aligned/135168        190121 ns       87.8 ns         2165x
bm_ranges_move_vb_aligned/139264        197667 ns       90.1 ns         2193x
bm_ranges_move_vb_aligned/143360        211083 ns       87.7 ns         2406x
bm_ranges_move_vb_aligned/147456        225014 ns       89.6 ns         2511x
bm_ranges_move_vb_aligned/151552        219499 ns       91.4 ns         2402x
bm_ranges_move_vb_aligned/155648        224825 ns       99.0 ns         2271x
bm_ranges_move_vb_aligned/159744        227261 ns        103 ns         2206x
bm_ranges_move_vb_aligned/163840        229928 ns        182 ns         1263x
bm_ranges_move_vb_aligned/167936        226347 ns        212 ns         1068x
bm_ranges_move_vb_aligned/172032        238817 ns        142 ns         1682x
bm_ranges_move_vb_aligned/176128        257062 ns        121 ns         2124x
bm_ranges_move_vb_aligned/180224        262250 ns        122 ns         2150x
bm_ranges_move_vb_aligned/184320        261486 ns        118 ns         2216x
bm_ranges_move_vb_aligned/188416        269745 ns        127 ns         2124x
bm_ranges_move_vb_aligned/192512        271024 ns        128 ns         2117x
bm_ranges_move_vb_aligned/196608        275733 ns        228 ns         1209x
bm_ranges_move_vb_aligned/200704        278653 ns        285 ns          978x
bm_ranges_move_vb_aligned/204800        286797 ns        243 ns         1180x

Aligned move_backward (up to 2226x)

-------------------------------------------------------------------------------------
Benchmark                                           Before        After   Improvement
-------------------------------------------------------------------------------------
bm_ranges_move_backward_vb_aligned/8               9.72 ns       5.03 ns         1.9x
bm_ranges_move_backward_vb_aligned/64              89.8 ns       5.23 ns        17.2x
bm_ranges_move_backward_vb_aligned/512              643 ns       5.18 ns         124x
bm_ranges_move_backward_vb_aligned/4096            5037 ns       10.1 ns         499x
bm_ranges_move_backward_vb_aligned/32768          40126 ns       44.9 ns         894x
bm_ranges_move_backward_vb_aligned/65536          80262 ns       77.5 ns        1036x
bm_ranges_move_backward_vb_aligned/102400        125445 ns       64.0 ns        1960x
bm_ranges_move_backward_vb_aligned/106496        131869 ns       69.0 ns        1911x
bm_ranges_move_backward_vb_aligned/110592        142678 ns       64.1 ns        2226x
bm_ranges_move_backward_vb_aligned/114688        141358 ns       65.1 ns        2171x
bm_ranges_move_backward_vb_aligned/118784        148357 ns       66.5 ns        2231x
bm_ranges_move_backward_vb_aligned/122880        151052 ns       68.8 ns        2196x
bm_ranges_move_backward_vb_aligned/126976        158514 ns       71.4 ns        2220x
bm_ranges_move_backward_vb_aligned/131072        163450 ns       73.5 ns        2224x
bm_ranges_move_backward_vb_aligned/135168        170184 ns        158 ns        1077x
bm_ranges_move_backward_vb_aligned/139264        172592 ns        103 ns        1676x
bm_ranges_move_backward_vb_aligned/143360        179955 ns       93.9 ns        1916x
bm_ranges_move_backward_vb_aligned/147456        185436 ns       95.1 ns        1950x
bm_ranges_move_backward_vb_aligned/151552        194811 ns       94.4 ns        2064x
bm_ranges_move_backward_vb_aligned/155648        195126 ns       96.3 ns        2026x
bm_ranges_move_backward_vb_aligned/159744        203279 ns       99.6 ns        2041x
bm_ranges_move_backward_vb_aligned/163840        208438 ns        174 ns        1198x
bm_ranges_move_backward_vb_aligned/167936        211036 ns        209 ns        1010x
bm_ranges_move_backward_vb_aligned/172032        217071 ns        139 ns        1562x
bm_ranges_move_backward_vb_aligned/176128        226155 ns        117 ns        1933x
bm_ranges_move_backward_vb_aligned/180224        232424 ns        117 ns        1987x
bm_ranges_move_backward_vb_aligned/184320        236457 ns        117 ns        2021x
bm_ranges_move_backward_vb_aligned/188416        240821 ns        127 ns        1896x
bm_ranges_move_backward_vb_aligned/192512        247794 ns        127 ns        1951x
bm_ranges_move_backward_vb_aligned/196608        243881 ns        217 ns        1124x
bm_ranges_move_backward_vb_aligned/200704        255730 ns        286 ns         894x
bm_ranges_move_backward_vb_aligned/204800        245215 ns        237 ns        1035x

Unaligned move_backward (up to 62x)

-----------------------------------------------------------------------------
Benchmark                                  Before         After   Improvement
-----------------------------------------------------------------------------
bm_ranges_move_vb_unaligned/8             11.4 ns       8.64 ns          1.3x
bm_ranges_move_vb_unaligned/64            96.5 ns       8.18 ns           12x
bm_ranges_move_vb_unaligned/512            755 ns       17.9 ns           42x
bm_ranges_move_vb_unaligned/4096          6013 ns       97.7 ns           62x
bm_ranges_move_vb_unaligned/32768        47906 ns        781 ns           61x
bm_ranges_move_vb_unaligned/262144      384167 ns       6201 ns           62x
bm_ranges_move_vb_unaligned/1048576    1521607 ns      25520 ns           60x

Unaligned move_backward (up to 64x)

-------------------------------------------------------------------------------------
Benchmark                                           Before        After   Improvement
-------------------------------------------------------------------------------------
bm_ranges_move_backward_vb_unaligned/8            10.00 ns       9.61 ns         1.0x
bm_ranges_move_backward_vb_unaligned/64            88.4 ns       8.53 ns          10x
bm_ranges_move_backward_vb_unaligned/512            656 ns       17.8 ns          37x
bm_ranges_move_backward_vb_unaligned/4096          5238 ns        102 ns          51x
bm_ranges_move_backward_vb_unaligned/32768        42843 ns        706 ns          61x
bm_ranges_move_backward_vb_unaligned/262144      339261 ns       5803 ns          58x
bm_ranges_move_backward_vb_unaligned/1048576    1484360 ns      23332 ns          64x

Patch is 35.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/121109.diff

10 Files Affected:

  • (modified) libcxx/docs/ReleaseNotes/21.rst (+2-3)
  • (modified) libcxx/include/__algorithm/move.h (+10)
  • (modified) libcxx/include/__algorithm/move_backward.h (+10)
  • (modified) libcxx/include/__bit_reference (-16)
  • (added) libcxx/test/benchmarks/algorithms/move.bench.cpp (+55)
  • (added) libcxx/test/benchmarks/algorithms/move_backward.bench.cpp (+55)
  • (modified) libcxx/test/std/algorithms/alg.modifying.operations/alg.move/move.pass.cpp (+42-11)
  • (modified) libcxx/test/std/algorithms/alg.modifying.operations/alg.move/move_backward.pass.cpp (+44-16)
  • (modified) libcxx/test/std/algorithms/alg.modifying.operations/alg.move/ranges.move.pass.cpp (+74-33)
  • (modified) libcxx/test/std/algorithms/alg.modifying.operations/alg.move/ranges.move_backward.pass.cpp (+84-34)
diff --git a/libcxx/docs/ReleaseNotes/21.rst b/libcxx/docs/ReleaseNotes/21.rst
index 82f1de6bad3942..9b3e120e4bfd64 100644
--- a/libcxx/docs/ReleaseNotes/21.rst
+++ b/libcxx/docs/ReleaseNotes/21.rst
@@ -43,9 +43,8 @@ Implemented Papers
 Improvements and New Features
 -----------------------------
 
-- The ``std::ranges::{copy, copy_n, copy_backward}`` algorithms have been optimized for ``std::vector<bool>::iterator``\s,
-  resulting in a performance improvement of up to 2000x.
-
+- The ``std::ranges::{copy, copy_n, copy_backward, move, move_backward}`` algorithms have been optimized for
+  ``std::vector<bool>::iterator``, resulting in a performance improvement of up to 2000x.
 
 Deprecations and Removals
 -------------------------
diff --git a/libcxx/include/__algorithm/move.h b/libcxx/include/__algorithm/move.h
index 6f3b0eb5d2927c..a3320e9f1985d0 100644
--- a/libcxx/include/__algorithm/move.h
+++ b/libcxx/include/__algorithm/move.h
@@ -9,11 +9,13 @@
 #ifndef _LIBCPP___ALGORITHM_MOVE_H
 #define _LIBCPP___ALGORITHM_MOVE_H
 
+#include <__algorithm/copy.h>
 #include <__algorithm/copy_move_common.h>
 #include <__algorithm/for_each_segment.h>
 #include <__algorithm/iterator_operations.h>
 #include <__algorithm/min.h>
 #include <__config>
+#include <__fwd/bit_reference.h>
 #include <__iterator/iterator_traits.h>
 #include <__iterator/segmented_iterator.h>
 #include <__type_traits/common_type.h>
@@ -98,6 +100,14 @@ struct __move_impl {
     }
   }
 
+  template <class _Cp, bool _IsConst>
+  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 pair<__bit_iterator<_Cp, _IsConst>, __bit_iterator<_Cp, false> >
+  operator()(__bit_iterator<_Cp, _IsConst> __first,
+             __bit_iterator<_Cp, _IsConst> __last,
+             __bit_iterator<_Cp, false> __result) {
+    return std::__copy(__first, __last, __result);
+  }
+
   // At this point, the iterators have been unwrapped so any `contiguous_iterator` has been unwrapped to a pointer.
   template <class _In, class _Out, __enable_if_t<__can_lower_move_assignment_to_memmove<_In, _Out>::value, int> = 0>
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 pair<_In*, _Out*>
diff --git a/libcxx/include/__algorithm/move_backward.h b/libcxx/include/__algorithm/move_backward.h
index 24a8d9b24527a7..14482fee181147 100644
--- a/libcxx/include/__algorithm/move_backward.h
+++ b/libcxx/include/__algorithm/move_backward.h
@@ -9,10 +9,12 @@
 #ifndef _LIBCPP___ALGORITHM_MOVE_BACKWARD_H
 #define _LIBCPP___ALGORITHM_MOVE_BACKWARD_H
 
+#include <__algorithm/copy_backward.h>
 #include <__algorithm/copy_move_common.h>
 #include <__algorithm/iterator_operations.h>
 #include <__algorithm/min.h>
 #include <__config>
+#include <__fwd/bit_reference.h>
 #include <__iterator/iterator_traits.h>
 #include <__iterator/segmented_iterator.h>
 #include <__type_traits/common_type.h>
@@ -107,6 +109,14 @@ struct __move_backward_impl {
     }
   }
 
+  template <class _Cp, bool _IsConst>
+  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 pair<__bit_iterator<_Cp, _IsConst>, __bit_iterator<_Cp, false> >
+  operator()(__bit_iterator<_Cp, _IsConst> __first,
+             __bit_iterator<_Cp, _IsConst> __last,
+             __bit_iterator<_Cp, false> __result) {
+    return std::__copy_backward<_ClassicAlgPolicy>(__first, __last, __result);
+  }
+
   // At this point, the iterators have been unwrapped so any `contiguous_iterator` has been unwrapped to a pointer.
   template <class _In, class _Out, __enable_if_t<__can_lower_move_assignment_to_memmove<_In, _Out>::value, int> = 0>
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 pair<_In*, _Out*>
diff --git a/libcxx/include/__bit_reference b/libcxx/include/__bit_reference
index bb8d4725c39805..c4e00769715479 100644
--- a/libcxx/include/__bit_reference
+++ b/libcxx/include/__bit_reference
@@ -186,22 +186,6 @@ private:
         __mask_(__m) {}
 };
 
-// move
-
-template <class _Cp, bool _IsConst>
-inline _LIBCPP_HIDE_FROM_ABI __bit_iterator<_Cp, false>
-move(__bit_iterator<_Cp, _IsConst> __first, __bit_iterator<_Cp, _IsConst> __last, __bit_iterator<_Cp, false> __result) {
-  return std::copy(__first, __last, __result);
-}
-
-// move_backward
-
-template <class _Cp, bool _IsConst>
-inline _LIBCPP_HIDE_FROM_ABI __bit_iterator<_Cp, false> move_backward(
-    __bit_iterator<_Cp, _IsConst> __first, __bit_iterator<_Cp, _IsConst> __last, __bit_iterator<_Cp, false> __result) {
-  return std::copy_backward(__first, __last, __result);
-}
-
 // swap_ranges
 
 template <class _Cl, class _Cr>
diff --git a/libcxx/test/benchmarks/algorithms/move.bench.cpp b/libcxx/test/benchmarks/algorithms/move.bench.cpp
new file mode 100644
index 00000000000000..909c0c4f1b4c58
--- /dev/null
+++ b/libcxx/test/benchmarks/algorithms/move.bench.cpp
@@ -0,0 +1,55 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// UNSUPPORTED: c++03, c++11, c++14, c++17, c++20
+
+#include <algorithm>
+#include <benchmark/benchmark.h>
+#include <vector>
+
+static void bm_ranges_move_vb(benchmark::State& state, bool aligned) {
+  auto n = state.range();
+  std::vector<bool> in(n, true);
+  std::vector<bool> out(aligned ? n : n + 8);
+  benchmark::DoNotOptimize(&in);
+  auto dst = aligned ? out.begin() : out.begin() + 4;
+  for (auto _ : state) {
+    benchmark::DoNotOptimize(std::ranges::move(in, dst));
+    benchmark::DoNotOptimize(&out);
+  }
+}
+
+static void bm_move_vb(benchmark::State& state, bool aligned) {
+  auto n = state.range();
+  std::vector<bool> in(n, true);
+  std::vector<bool> out(aligned ? n : n + 8);
+  benchmark::DoNotOptimize(&in);
+  auto beg = in.begin();
+  auto end = in.end();
+  auto dst = aligned ? out.begin() : out.begin() + 4;
+  for (auto _ : state) {
+    benchmark::DoNotOptimize(std::move(beg, end, dst));
+    benchmark::DoNotOptimize(&out);
+  }
+}
+
+static void bm_ranges_move_vb_aligned(benchmark::State& state) { bm_ranges_move_vb(state, true); }
+static void bm_ranges_move_vb_unaligned(benchmark::State& state) { bm_ranges_move_vb(state, false); }
+
+static void bm_move_vb_aligned(benchmark::State& state) { bm_move_vb(state, true); }
+static void bm_move_vb_unaligned(benchmark::State& state) { bm_move_vb(state, false); }
+
+// Test std::ranges::move for vector<bool>::iterator
+BENCHMARK(bm_ranges_move_vb_aligned)->Range(8, 1 << 16)->DenseRange(102400, 204800, 4096);
+BENCHMARK(bm_ranges_move_vb_unaligned)->Range(8, 1 << 20);
+
+// Test std::move for vector<bool>::iterator
+BENCHMARK(bm_move_vb_aligned)->Range(8, 1 << 20);
+BENCHMARK(bm_move_vb_unaligned)->Range(8, 1 << 20);
+
+BENCHMARK_MAIN();
diff --git a/libcxx/test/benchmarks/algorithms/move_backward.bench.cpp b/libcxx/test/benchmarks/algorithms/move_backward.bench.cpp
new file mode 100644
index 00000000000000..48b1a776bf4dd9
--- /dev/null
+++ b/libcxx/test/benchmarks/algorithms/move_backward.bench.cpp
@@ -0,0 +1,55 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// UNSUPPORTED: c++03, c++11, c++14, c++17, c++20
+
+#include <algorithm>
+#include <benchmark/benchmark.h>
+#include <vector>
+
+static void bm_ranges_move_backward_vb(benchmark::State& state, bool aligned) {
+  auto n = state.range();
+  std::vector<bool> in(n, true);
+  std::vector<bool> out(aligned ? n : n + 8);
+  benchmark::DoNotOptimize(&in);
+  auto dst = aligned ? out.end() : out.end() - 4;
+  for (auto _ : state) {
+    benchmark::DoNotOptimize(std::ranges::move_backward(in, dst));
+    benchmark::DoNotOptimize(&out);
+  }
+}
+
+static void bm_move_backward(benchmark::State& state, bool aligned) {
+  auto n = state.range();
+  std::vector<bool> in(n, true);
+  std::vector<bool> out(aligned ? n : n + 8);
+  benchmark::DoNotOptimize(&in);
+  auto beg = in.begin();
+  auto end = in.end();
+  auto dst = aligned ? out.end() : out.end() - 4;
+  for (auto _ : state) {
+    benchmark::DoNotOptimize(std::move_backward(beg, end, dst));
+    benchmark::DoNotOptimize(&out);
+  }
+}
+
+static void bm_ranges_move_backward_vb_aligned(benchmark::State& state) { bm_ranges_move_backward_vb(state, true); }
+static void bm_ranges_move_backward_vb_unaligned(benchmark::State& state) { bm_ranges_move_backward_vb(state, false); }
+
+static void bm_move_backward_vb_aligned(benchmark::State& state) { bm_move_backward(state, true); }
+static void bm_move_backward_vb_unaligned(benchmark::State& state) { bm_move_backward(state, false); }
+
+// Test std::ranges::move_backward for vector<bool>::iterator
+BENCHMARK(bm_ranges_move_backward_vb_aligned)->Range(8, 1 << 16)->DenseRange(102400, 204800, 4096);
+BENCHMARK(bm_ranges_move_backward_vb_unaligned)->Range(8, 1 << 20);
+
+// Test std::move_backward for vector<bool>::iterator
+BENCHMARK(bm_move_backward_vb_aligned)->Range(8, 1 << 20);
+BENCHMARK(bm_move_backward_vb_unaligned)->Range(8, 1 << 20);
+
+BENCHMARK_MAIN();
diff --git a/libcxx/test/std/algorithms/alg.modifying.operations/alg.move/move.pass.cpp b/libcxx/test/std/algorithms/alg.modifying.operations/alg.move/move.pass.cpp
index b1ad6873bc5e5a..8b414b061105f2 100644
--- a/libcxx/test/std/algorithms/alg.modifying.operations/alg.move/move.pass.cpp
+++ b/libcxx/test/std/algorithms/alg.modifying.operations/alg.move/move.pass.cpp
@@ -20,6 +20,7 @@
 #include <cassert>
 #include <iterator>
 #include <memory>
+#include <vector>
 
 #include "MoveOnly.h"
 #include "test_iterators.h"
@@ -45,15 +46,15 @@ struct Test {
   template <class OutIter>
   TEST_CONSTEXPR_CXX20 void operator()() {
     const unsigned N = 1000;
-    int ia[N] = {};
+    int ia[N]        = {};
     for (unsigned i = 0; i < N; ++i)
-        ia[i] = i;
+      ia[i] = i;
     int ib[N] = {0};
 
-    OutIter r = std::move(InIter(ia), InIter(ia+N), OutIter(ib));
-    assert(base(r) == ib+N);
+    OutIter r = std::move(InIter(ia), InIter(ia + N), OutIter(ib));
+    assert(base(r) == ib + N);
     for (unsigned i = 0; i < N; ++i)
-        assert(ia[i] == ib[i]);
+      assert(ia[i] == ib[i]);
   }
 };
 
@@ -73,13 +74,13 @@ struct Test1 {
     const unsigned N = 100;
     std::unique_ptr<int> ia[N];
     for (unsigned i = 0; i < N; ++i)
-        ia[i].reset(new int(i));
+      ia[i].reset(new int(i));
     std::unique_ptr<int> ib[N];
 
-    OutIter r = std::move(InIter(ia), InIter(ia+N), OutIter(ib));
-    assert(base(r) == ib+N);
+    OutIter r = std::move(InIter(ia), InIter(ia + N), OutIter(ib));
+    assert(base(r) == ib + N);
     for (unsigned i = 0; i < N; ++i)
-        assert(*ib[i] == static_cast<int>(i));
+      assert(*ib[i] == static_cast<int>(i));
   }
 };
 
@@ -92,6 +93,26 @@ struct Test1OutIters {
   }
 };
 
+TEST_CONSTEXPR_CXX20 bool test_vector_bool(std::size_t N) {
+  std::vector<bool> in(N, false);
+  for (std::size_t i = 0; i < N; i += 2)
+    in[i] = true;
+
+  { // Test move with aligned bytes
+    std::vector<bool> out(N);
+    std::move(in.begin(), in.end(), out.begin());
+    assert(in == out);
+  }
+  { // Test move with unaligned bytes
+    std::vector<bool> out(N + 8);
+    std::move(in.begin(), in.end(), out.begin() + 4);
+    for (std::size_t i = 0; i < N; ++i)
+      assert(out[i + 4] == in[i]);
+  }
+
+  return true;
+}
+
 TEST_CONSTEXPR_CXX20 bool test() {
   types::for_each(types::cpp17_input_iterator_list<int*>(), TestOutIters());
   if (TEST_STD_AT_LEAST_23_OR_RUNTIME_EVALUATED)
@@ -118,7 +139,7 @@ TEST_CONSTEXPR_CXX20 bool test() {
     // When non-trivial
     {
       MoveOnly from[3] = {1, 2, 3};
-      MoveOnly to[3] = {};
+      MoveOnly to[3]   = {};
       std::move(std::begin(from), std::end(from), std::begin(to));
       assert(to[0] == MoveOnly(1));
       assert(to[1] == MoveOnly(2));
@@ -127,7 +148,7 @@ TEST_CONSTEXPR_CXX20 bool test() {
     // When trivial
     {
       TrivialMoveOnly from[3] = {1, 2, 3};
-      TrivialMoveOnly to[3] = {};
+      TrivialMoveOnly to[3]   = {};
       std::move(std::begin(from), std::end(from), std::begin(to));
       assert(to[0] == TrivialMoveOnly(1));
       assert(to[1] == TrivialMoveOnly(2));
@@ -135,6 +156,16 @@ TEST_CONSTEXPR_CXX20 bool test() {
     }
   }
 
+  { // Test vector<bool>::iterator optimization
+    assert(test_vector_bool(8));
+    assert(test_vector_bool(19));
+    assert(test_vector_bool(32));
+    assert(test_vector_bool(49));
+    assert(test_vector_bool(64));
+    assert(test_vector_bool(199));
+    assert(test_vector_bool(256));
+  }
+
   return true;
 }
 
diff --git a/libcxx/test/std/algorithms/alg.modifying.operations/alg.move/move_backward.pass.cpp b/libcxx/test/std/algorithms/alg.modifying.operations/alg.move/move_backward.pass.cpp
index 61dea47b510716..dfee9de2fa7687 100644
--- a/libcxx/test/std/algorithms/alg.modifying.operations/alg.move/move_backward.pass.cpp
+++ b/libcxx/test/std/algorithms/alg.modifying.operations/alg.move/move_backward.pass.cpp
@@ -19,6 +19,7 @@
 #include <cassert>
 #include <iterator>
 #include <memory>
+#include <vector>
 
 #include "MoveOnly.h"
 #include "test_iterators.h"
@@ -44,24 +45,22 @@ struct Test {
   template <class OutIter>
   TEST_CONSTEXPR_CXX20 void operator()() {
     const unsigned N = 1000;
-    int ia[N] = {};
+    int ia[N]        = {};
     for (unsigned i = 0; i < N; ++i)
-        ia[i] = i;
+      ia[i] = i;
     int ib[N] = {0};
 
-    OutIter r = std::move_backward(InIter(ia), InIter(ia+N), OutIter(ib+N));
+    OutIter r = std::move_backward(InIter(ia), InIter(ia + N), OutIter(ib + N));
     assert(base(r) == ib);
     for (unsigned i = 0; i < N; ++i)
-        assert(ia[i] == ib[i]);
+      assert(ia[i] == ib[i]);
   }
 };
 
 struct TestOutIters {
   template <class InIter>
   TEST_CONSTEXPR_CXX20 void operator()() {
-    types::for_each(
-        types::concatenate_t<types::bidirectional_iterator_list<int*> >(),
-        Test<InIter>());
+    types::for_each(types::concatenate_t<types::bidirectional_iterator_list<int*> >(), Test<InIter>());
   }
 };
 
@@ -72,24 +71,44 @@ struct Test1 {
     const unsigned N = 100;
     std::unique_ptr<int> ia[N];
     for (unsigned i = 0; i < N; ++i)
-        ia[i].reset(new int(i));
+      ia[i].reset(new int(i));
     std::unique_ptr<int> ib[N];
 
-    OutIter r = std::move_backward(InIter(ia), InIter(ia+N), OutIter(ib+N));
+    OutIter r = std::move_backward(InIter(ia), InIter(ia + N), OutIter(ib + N));
     assert(base(r) == ib);
     for (unsigned i = 0; i < N; ++i)
-        assert(*ib[i] == static_cast<int>(i));
+      assert(*ib[i] == static_cast<int>(i));
   }
 };
 
 struct Test1OutIters {
   template <class InIter>
   TEST_CONSTEXPR_CXX23 void operator()() {
-    types::for_each(types::concatenate_t<types::bidirectional_iterator_list<std::unique_ptr<int>*> >(),
-                    Test1<InIter>());
+    types::for_each(
+        types::concatenate_t<types::bidirectional_iterator_list<std::unique_ptr<int>*> >(), Test1<InIter>());
   }
 };
 
+TEST_CONSTEXPR_CXX20 bool test_vector_bool(std::size_t N) {
+  std::vector<bool> in(N, false);
+  for (std::size_t i = 0; i < N; i += 2)
+    in[i] = true;
+
+  { // Test move_backward with aligned bytes
+    std::vector<bool> out(N);
+    std::move_backward(in.begin(), in.end(), out.end());
+    assert(in == out);
+  }
+  { // Test move_backward with unaligned bytes
+    std::vector<bool> out(N + 8);
+    std::move_backward(in.begin(), in.end(), out.end() - 4);
+    for (std::size_t i = 0; i < N; ++i)
+      assert(out[i + 4] == in[i]);
+  }
+
+  return true;
+}
+
 TEST_CONSTEXPR_CXX20 bool test() {
   types::for_each(types::bidirectional_iterator_list<int*>(), TestOutIters());
   if (TEST_STD_AT_LEAST_23_OR_RUNTIME_EVALUATED)
@@ -117,7 +136,7 @@ TEST_CONSTEXPR_CXX20 bool test() {
     // When non-trivial
     {
       MoveOnly from[3] = {1, 2, 3};
-      MoveOnly to[3] = {};
+      MoveOnly to[3]   = {};
       std::move_backward(std::begin(from), std::end(from), std::end(to));
       assert(to[0] == MoveOnly(1));
       assert(to[1] == MoveOnly(2));
@@ -126,7 +145,7 @@ TEST_CONSTEXPR_CXX20 bool test() {
     // When trivial
     {
       TrivialMoveOnly from[3] = {1, 2, 3};
-      TrivialMoveOnly to[3] = {};
+      TrivialMoveOnly to[3]   = {};
       std::move_backward(std::begin(from), std::end(from), std::end(to));
       assert(to[0] == TrivialMoveOnly(1));
       assert(to[1] == TrivialMoveOnly(2));
@@ -134,11 +153,20 @@ TEST_CONSTEXPR_CXX20 bool test() {
     }
   }
 
+  { // Test vector<bool>::iterator optimization
+    assert(test_vector_bool(8));
+    assert(test_vector_bool(19));
+    assert(test_vector_bool(32));
+    assert(test_vector_bool(49));
+    assert(test_vector_bool(64));
+    assert(test_vector_bool(199));
+    assert(test_vector_bool(256));
+  }
+
   return true;
 }
 
-int main(int, char**)
-{
+int main(int, char**) {
   test();
 #if TEST_STD_VER >= 20
   static_assert(test());
diff --git a/libcxx/test/std/algorithms/alg.modifying.operations/alg.move/ranges.move.pass.cpp b/libcxx/test/std/algorithms/alg.modifying.operations/alg.move/ranges.move.pass.cpp
index a0d1473360a14e..664631aea826b1 100644
--- a/libcxx/test/std/algorithms/alg.modifying.operations/alg.move/ranges.move.pass.cpp
+++ b/libcxx/test/std/algorithms/alg.modifying.operations/alg.move/ranges.move.pass.cpp
@@ -31,6 +31,7 @@
 #include "almost_satisfies_types.h"
 #include "MoveOnly.h"
 #include "test_iterators.h"
+#include "test_macros.h"
 
 template <class In, class Out = In, class Sent = sentinel_wrapper<In>>
 concept HasMoveIt = requires(In in, Sent sent, Out out) { std::ranges::move(in, sent, out); };
@@ -65,7 +66,7 @@ constexpr void test(std::array<int, N> in) {
   {
     std::array<int, N> out;
     std::same_as<std::ranges::in_out_result<In, Out>> decltype(auto) ret =
-      std::ranges::move(In(in.data()), Sent(In(in.data() + in.size())), Out(out.data()));
+        std::ranges::move(In(in.data()), Sent(In(in.data() + in.size())), Out(out.data()));
     assert(in == out);
     assert(base(ret.in) == in.data() + in.size());
     assert(base(ret.out) == out.data() + out.size());
@@ -73,8 +74,7 @@ constexpr void test(std::array<int, N> in) {
   {
     std::array<int, N> out;
     auto range = std::ranges::subrange(In(in.data()), Sent(In(in.data() + in.size())));
-    std::same_as<std::ranges::in_out_result<In, Out>> decltype(auto) ret =
-        std::ranges::move(range, Out(out.data()));
+    std::same_as<std::ranges::in_out_result<In, Out>> decltype(auto) ret = std::ranges::move(range, Out(out.data()));
     assert(in == out);
     assert(base(ret.in) == in.data() + in.size());
     assert(base(ret.out) == out.data() + out.size());
@@ -84,16 +84,16 @@ constexpr void test(std::array<int, N> in) {
 template <class InContainer, class OutContainer, class In, class Out, class Sent = In>
 constexpr void test_containers() {
   {
-    InContainer in {1, 2, 3, 4};
+    InContainer in{1, 2, 3, 4};
     OutContainer out(4);
     std::same_as<std::ranges::in_out_result<In, Out>> auto ret =
-      std::ranges::move(In(in.begin()), Sent(In(in.end())), Out(out.begin()));
+        std::ranges::move(In(in.begin()), Sent(In(in.end())), Out(out.begin()));
     assert(std::ranges::equal(in, out));
     assert(base(ret.in) == in.end());
     assert(base(ret.out) == out.end());
   }
   {
-    InContainer in {1, 2, 3, 4};
+    InContainer in{1, 2, 3, 4};
     OutContainer out(4);
     auto range = std::ranges::subrange(In(in.begin()), Sent(In(in.end())));
     std::same_as<std::ranges::in_out_result<In, Out>> auto ret = std::ranges::move(range, Out(out.begin()));
@@ -165,22 +165,51 @@ constexpr void test_proxy_in_iterators() {
 }
 
 struct IteratorWithMoveIter {
-  using value_type = int;
-  using difference_type = int;
+  using value_type                = int;
+  using difference_type           = int;
   explicit IteratorWithMoveIter() = default;
   int* ptr;
   constexpr IteratorWithMoveIter(int* ptr_) : ptr(ptr_) {}
 
   constexpr int& operator*() const; // iterator with iter_move should not be dereferenced
 
-  constexpr Iterato...
[truncated]

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments. If the benchmark comment is too difficult to address, it might not be worth it, I can live with the code as-is. Thanks for the optimization!

@ldionne ldionne added the pending-ci Merging the PR is only pending completion of CI label Feb 5, 2025
@winner245 winner245 force-pushed the optimize-ranges-move branch from 926cc41 to 77235c2 Compare February 6, 2025 20:01
@winner245 winner245 force-pushed the optimize-ranges-move branch from 77235c2 to f472c55 Compare February 11, 2025 21:58
@ldionne ldionne merged commit ab3d793 into llvm:main Feb 19, 2025
77 checks passed
@winner245 winner245 deleted the optimize-ranges-move branch February 20, 2025 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. pending-ci Merging the PR is only pending completion of CI performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants