Skip to content

Commit 2e931b5

Browse files
committed
style-guide: Rework version-sorting algorithm
Treat numeric chunks with equal value but differing numbers of leading zeroes as equal, unless we get to the end of the entire string in which case we use "more leading zeroes in the earliest differing chunk" as a tiebreaker. Treat `_` as a word separator, sorting it before anything other than space. Give more examples.
1 parent f06df22 commit 2e931b5

File tree

1 file changed

+69
-22
lines changed

1 file changed

+69
-22
lines changed

src/doc/style-guide/src/README.md

+69-22
Original file line numberDiff line numberDiff line change
@@ -109,32 +109,79 @@ lexicographical.)
109109

110110
For the purposes of the Rust style, to compare two strings for version-sorting:
111111

112-
- Compare the strings by (Unicode) character lexicographically, finding the
113-
index of the first differing character. (If the two strings do not have the
114-
same length, this may be the end of the shorter string.)
115-
- For both strings, determine the longest sequence of ASCII digits that either
116-
contains or ends at that index. (If either string doesn't have such a
117-
sequence of ASCII digits, fall back to comparing the strings
118-
lexicographically.)
119-
- Compare the numeric values of the number specified by the sequence of digits.
120-
(Note that an implementation of this algorithm can easily check this without
121-
accumulating copies of the digits or converting to a number: after skipping
122-
leading zeroes, longer sequences of digits are larger numbers, and
123-
equal-length sequences of digits can be sorted lexicographically.)
124-
- If the numbers have the same numeric value, the one with more leading zeroes
125-
comes first.
126-
127-
Note that there exist various algorithms called "version sorting", which differ
128-
most commonly in their handling of numbers with leading zeroes. This algorithm
112+
- Process both strings from beginning to end as two sequences of maximal-length
113+
chunks, where each chunk consists either of a sequence of characters other
114+
than ASCII digits, or a sequence of ASCII digits (a numeric chunk), and
115+
compare corresponding chunks from the strings.
116+
- To compare two numeric chunks, compare them by numeric value, ignoring
117+
leading zeroes. If the two chunks have equal numeric value, but different
118+
numbers of leading digits, and this is the first time this has happened for
119+
these strings, treat the chunks as equal (moving on to the next chunk) but
120+
remember which string had more leading zeroes.
121+
- To compare two chunks if both are not numeric, compare them by Unicode
122+
character lexicographically, except that `_` (underscore) sorts immediately
123+
after ` ` (space) but before any other character. (This treats underscore as
124+
a word separator, as commonly used in identifiers.)
125+
- If the use of version sorting specifies further modifiers, such as sorting
126+
non-lowercase before lowercase, apply those modifiers to the lexicographic
127+
sort in this step.
128+
- If the comparison reaches the end of the string and considers each pair of
129+
chunks equal:
130+
- If one of the numeric comparisons noted the earliest point at which one
131+
string had more leading zeroes than the other, sort the string with more
132+
leading zeroes first.
133+
- Otherwise, the strings are equal.
134+
135+
Note that there exist various algorithms called "version sorting", which
136+
generally try to solve the same problem, but which differ in various ways (such
137+
as in their handling of numbers with leading zeroes). This algorithm
129138
does not purport to precisely match the behavior of any particular other
130139
algorithm, only to produce a simple and satisfying result for Rust formatting.
131-
(In particular, this algorithm aims to produce a satisfying result for a set of
140+
In particular, this algorithm aims to produce a satisfying result for a set of
132141
symbols that have the same number of leading zeroes, and an acceptable and
133142
easily understandable result for a set of symbols that has varying numbers of
134-
leading zeroes.)
135-
136-
As an example, version-sorting will sort the following symbols in the order
137-
given: `x000`, `x00`, `x0`, `x01`, `x1`, `x09`, `x9`, `x010`, `x10`.
143+
leading zeroes.
144+
145+
As an example, version-sorting will sort the following strings in the order
146+
given:
147+
- `_ZYWX`
148+
- `u_zzz`
149+
- `u8`
150+
- `u16`
151+
- `u32`
152+
- `u64`
153+
- `u128`
154+
- `u256`
155+
- `ua`
156+
- `usize`
157+
- `uz`
158+
- `v000`
159+
- `v00`
160+
- `v0`
161+
- `v0s`
162+
- `v00t`
163+
- `v0u`
164+
- `v001`
165+
- `v01`
166+
- `v1`
167+
- `v009`
168+
- `v09`
169+
- `v9`
170+
- `v010`
171+
- `v10`
172+
- `w005s09t`
173+
- `w5s009t`
174+
- `x64`
175+
- `x86`
176+
- `x86_32`
177+
- `x86_64`
178+
- `x86_128`
179+
- `x87`
180+
- `Z_YWX`
181+
- `ZY_WX`
182+
- `ZYW_X`
183+
- `ZYWX`
184+
- `ZYWX_`
138185

139186
### [Module-level items](items.md)
140187

0 commit comments

Comments
 (0)