Skip to content

Commit 752d365

Browse files
committed
Revert #1423: Document C string literal tokens.
This reverts commit 21a27e1, reversing changes made to 01a12f2. This is being reverted in rust-lang/rust#119528
1 parent f9f5b5b commit 752d365

File tree

3 files changed

+3
-128
lines changed

3 files changed

+3
-128
lines changed

src/expressions/literal-expr.md

-10
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,6 @@
88
>    | [BYTE_LITERAL]\
99
>    | [BYTE_STRING_LITERAL]\
1010
>    | [RAW_BYTE_STRING_LITERAL]\
11-
>    | [C_STRING_LITERAL]\
12-
>    | [RAW_C_STRING_LITERAL]\
1311
>    | [INTEGER_LITERAL]\
1412
>    | [FLOAT_LITERAL]\
1513
>    | `true` | `false`
@@ -50,12 +48,6 @@ A string literal expression consists of a single [BYTE_STRING_LITERAL] or [RAW_B
5048

5149
> **Note**: This section is incomplete.
5250
53-
## C string literal expressions
54-
55-
A C string literal expression consists of a single [C_STRING_LITERAL] or [RAW_C_STRING_LITERAL] token.
56-
57-
> **Note**: This section is incomplete.
58-
5951
## Integer literal expressions
6052

6153
An integer literal expression consists of a single [INTEGER_LITERAL] token.
@@ -190,7 +182,5 @@ The expression's type is the primitive [boolean type], and its value is:
190182
[BYTE_LITERAL]: ../tokens.md#byte-literals
191183
[BYTE_STRING_LITERAL]: ../tokens.md#byte-string-literals
192184
[RAW_BYTE_STRING_LITERAL]: ../tokens.md#raw-byte-string-literals
193-
[C_STRING_LITERAL]: ../tokens.md#c-string-literals
194-
[RAW_C_STRING_LITERAL]: ../tokens.md#raw-c-string-literals
195185
[INTEGER_LITERAL]: ../tokens.md#integer-literals
196186
[FLOAT_LITERAL]: ../tokens.md#floating-point-literals

src/patterns.md

-12
Original file line numberDiff line numberDiff line change
@@ -123,8 +123,6 @@ if let (a, 3) = (1, 2) { // "(a, 3)" is refutable, and will not match
123123
>    | [RAW_STRING_LITERAL]\
124124
>    | [BYTE_STRING_LITERAL]\
125125
>    | [RAW_BYTE_STRING_LITERAL]\
126-
>    | [C_STRING_LITERAL]\
127-
>    | [RAW_C_STRING_LITERAL]\
128126
> &nbsp;&nbsp; | `-`<sup>?</sup> [INTEGER_LITERAL]\
129127
> &nbsp;&nbsp; | `-`<sup>?</sup> [FLOAT_LITERAL]
130128
@@ -134,8 +132,6 @@ if let (a, 3) = (1, 2) { // "(a, 3)" is refutable, and will not match
134132
[RAW_STRING_LITERAL]: tokens.md#raw-string-literals
135133
[BYTE_STRING_LITERAL]: tokens.md#byte-string-literals
136134
[RAW_BYTE_STRING_LITERAL]: tokens.md#raw-byte-string-literals
137-
[C_STRING_LITERAL]: tokens.md#c-string-literals
138-
[RAW_C_STRING_LITERAL]: tokens.md#raw-c-string-literals
139135
[INTEGER_LITERAL]: tokens.md#integer-literals
140136
[FLOAT_LITERAL]: tokens.md#floating-point-literals
141137

@@ -148,14 +144,6 @@ Floating-point literals are currently accepted, but due to the complexity of com
148144

149145
</div>
150146

151-
<div class="warning">
152-
153-
C string and raw C string literals are accepted in literal patterns, but `&CStr`
154-
doesn't implement structural equality (`#[derive(Eq, PartialEq)]`) and therefore
155-
any such `match` on a `&CStr` will be rejected with a type error.
156-
157-
</div>
158-
159147
Literal patterns are always refutable.
160148

161149
Examples:

src/tokens.md

+3-106
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,6 @@ Literals are tokens used in [literal expressions].
3232
| [Byte](#byte-literals) | `b'H'` | 0 | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) |
3333
| [Byte string](#byte-string-literals) | `b"hello"` | 0 | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) |
3434
| [Raw byte string](#raw-byte-string-literals) | `br#"hello"#` | <256 | All ASCII | `N/A` |
35-
| [C string](#c-string-literals) | `c"hello"` | 0 | All Unicode | [Quote](#quote-escapes) & [Byte](#byte-escapes) & [Unicode](#unicode-escapes) |
36-
| [Raw C string](#raw-c-string-literals) | `cr#"hello"#` | <256 | All Unicode | `N/A` |
3735

3836
\* The number of `#`s on each side of the same literal must be equivalent.
3937

@@ -330,107 +328,6 @@ b"\x52"; b"R"; br"R"; // R
330328
b"\\x52"; br"\x52"; // \x52
331329
```
332330

333-
### C string and raw C string literals
334-
335-
#### C string literals
336-
337-
> **<sup>Lexer</sup>**\
338-
> C_STRING_LITERAL :\
339-
> &nbsp;&nbsp; `c"` (\
340-
> &nbsp;&nbsp; &nbsp;&nbsp; ~\[`"` `\` _IsolatedCR_]\
341-
> &nbsp;&nbsp; &nbsp;&nbsp; | BYTE_ESCAPE\
342-
> &nbsp;&nbsp; &nbsp;&nbsp; | UNICODE_ESCAPE\
343-
> &nbsp;&nbsp; &nbsp;&nbsp; | STRING_CONTINUE\
344-
> &nbsp;&nbsp; )<sup>\*</sup> `"` SUFFIX<sup>?</sup>
345-
346-
A _C string literal_ is a sequence of Unicode characters and _escapes_,
347-
preceded by the characters `U+0063` (`c`) and `U+0022` (double-quote), and
348-
followed by the character `U+0022`. If the character `U+0022` is present within
349-
the literal, it must be _escaped_ by a preceding `U+005C` (`\`) character.
350-
Alternatively, a C string literal can be a _raw C string literal_, defined
351-
below. The type of a C string literal is [`&core::ffi::CStr`][CStr].
352-
353-
[CStr]: ../core/ffi/struct.CStr.html
354-
355-
C strings are implicitly terminated by byte `0x00`, so the C string literal
356-
`c""` is equivalent to manually constructing a `&CStr` from the byte string
357-
literal `b"\x00"`. Other than the implicit terminator, byte `0x00` is not
358-
permitted within a C string.
359-
360-
Some additional _escapes_ are available in non-raw C string literals. An escape
361-
starts with a `U+005C` (`\`) and continues with one of the following forms:
362-
363-
* A _byte escape_ escape starts with `U+0078` (`x`) and is followed by exactly
364-
two _hex digits_. It denotes the byte equal to the provided hex value.
365-
* A _24-bit code point escape_ starts with `U+0075` (`u`) and is followed
366-
by up to six _hex digits_ surrounded by braces `U+007B` (`{`) and `U+007D`
367-
(`}`). It denotes the Unicode code point equal to the provided hex value,
368-
encoded as UTF-8.
369-
* A _whitespace escape_ is one of the characters `U+006E` (`n`), `U+0072`
370-
(`r`), or `U+0074` (`t`), denoting the bytes values `0x0A` (ASCII LF),
371-
`0x0D` (ASCII CR) or `0x09` (ASCII HT) respectively.
372-
* The _backslash escape_ is the character `U+005C` (`\`) which must be
373-
escaped in order to denote its ASCII encoding `0x5C`.
374-
375-
The escape sequences `\0`, `\x00`, and `\u{0000}` are permitted within the token
376-
but will be rejected as invalid, as C strings may not contain byte `0x00` except
377-
as the implicit terminator.
378-
379-
A C string represents bytes with no defined encoding, but a C string literal
380-
may contain Unicode characters above `U+007F`. Such characters will be replaced
381-
with the bytes of that character's UTF-8 representation.
382-
383-
The following C string literals are equivalent:
384-
385-
```rust
386-
c"æ"; // LATIN SMALL LETTER AE (U+00E6)
387-
c"\u{00E6}";
388-
c"\xC3\xA6";
389-
```
390-
391-
> **Edition Differences**: C string literals are accepted in the 2021 edition or
392-
> later. In earlier additions the token `c""` is lexed as `c ""`.
393-
394-
#### Raw C string literals
395-
396-
> **<sup>Lexer</sup>**\
397-
> RAW_C_STRING_LITERAL :\
398-
> &nbsp;&nbsp; `cr` RAW_C_STRING_CONTENT SUFFIX<sup>?</sup>
399-
>
400-
> RAW_C_STRING_CONTENT :\
401-
> &nbsp;&nbsp; &nbsp;&nbsp; `"` ( ~ _IsolatedCR_ )<sup>* (non-greedy)</sup> `"`\
402-
> &nbsp;&nbsp; | `#` RAW_C_STRING_CONTENT `#`
403-
404-
Raw C string literals do not process any escapes. They start with the
405-
character `U+0063` (`c`), followed by `U+0072` (`r`), followed by fewer than 256
406-
of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. The
407-
_raw C string body_ can contain any sequence of Unicode characters and is
408-
terminated only by another `U+0022` (double-quote) character, followed by the
409-
same number of `U+0023` (`#`) characters that preceded the opening `U+0022`
410-
(double-quote) character.
411-
412-
All characters contained in the raw C string body represent themselves in UTF-8
413-
encoding. The characters `U+0022` (double-quote) (except when followed by at
414-
least as many `U+0023` (`#`) characters as were used to start the raw C string
415-
literal) or `U+005C` (`\`) do not have any special meaning.
416-
417-
> **Edition Differences**: Raw C string literals are accepted in the 2021
418-
> edition or later. In earlier additions the token `cr""` is lexed as `cr ""`,
419-
> and `cr#""#` is lexed as `cr #""#` (which is non-grammatical).
420-
421-
#### Examples for C string and raw C string literals
422-
423-
```rust
424-
c"foo"; cr"foo"; // foo
425-
c"\"foo\""; cr#""foo""#; // "foo"
426-
427-
c"foo #\"# bar";
428-
cr##"foo #"# bar"##; // foo #"# bar
429-
430-
c"\x52"; c"R"; cr"R"; // R
431-
c"\\x52"; cr"\x52"; // \x52
432-
```
433-
434331
### Number literals
435332

436333
A _number literal_ is either an _integer literal_ or a _floating-point
@@ -731,17 +628,17 @@ them are referred to as "token trees" in [macros]. The three types of brackets
731628
## Reserved prefixes
732629

733630
> **<sup>Lexer 2021+</sup>**\
734-
> RESERVED_TOKEN_DOUBLE_QUOTE : ( IDENTIFIER_OR_KEYWORD <sub>_Except `b` or `c` or `r` or `br` or `cr`_</sub> | `_` ) `"`\
631+
> RESERVED_TOKEN_DOUBLE_QUOTE : ( IDENTIFIER_OR_KEYWORD <sub>_Except `b` or `r` or `br`_</sub> | `_` ) `"`\
735632
> RESERVED_TOKEN_SINGLE_QUOTE : ( IDENTIFIER_OR_KEYWORD <sub>_Except `b`_</sub> | `_` ) `'`\
736-
> RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD <sub>_Except `r` or `br` or `cr`_</sub> | `_` ) `#`
633+
> RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD <sub>_Except `r` or `br`_</sub> | `_` ) `#`
737634
738635
Some lexical forms known as _reserved prefixes_ are reserved for future use.
739636

740637
Source input which would otherwise be lexically interpreted as a non-raw identifier (or a keyword or `_`) which is immediately followed by a `#`, `'`, or `"` character (without intervening whitespace) is identified as a reserved prefix.
741638

742639
Note that raw identifiers, raw string literals, and raw byte string literals may contain a `#` character but are not interpreted as containing a reserved prefix.
743640

744-
Similarly the `r`, `b`, `br`, `c`, and `cr` prefixes used in raw string literals, byte literals, byte string literals, raw byte string literals, C string literals, and raw C string literals are not interpreted as reserved prefixes.
641+
Similarly the `r`, `b`, and `br` prefixes used in raw string literals, byte literals, byte string literals, and raw byte string literals are not interpreted as reserved prefixes.
745642

746643
> **Edition Differences**: Starting with the 2021 edition, reserved prefixes are reported as an error by the lexer (in particular, they cannot be passed to macros).
747644
>

0 commit comments

Comments
 (0)