@@ -32,8 +32,6 @@ Literals are tokens used in [literal expressions].
32
32
| [ Byte] ( #byte-literals ) | ` b'H' ` | 0 | All ASCII | [ Quote] ( #quote-escapes ) & [ Byte] ( #byte-escapes ) |
33
33
| [ Byte string] ( #byte-string-literals ) | ` b"hello" ` | 0 | All ASCII | [ Quote] ( #quote-escapes ) & [ Byte] ( #byte-escapes ) |
34
34
| [ Raw byte string] ( #raw-byte-string-literals ) | ` br#"hello"# ` | <256 | All ASCII | ` N/A ` |
35
- | [ C string] ( #c-string-literals ) | ` c"hello" ` | 0 | All Unicode | [ Quote] ( #quote-escapes ) & [ Byte] ( #byte-escapes ) & [ Unicode] ( #unicode-escapes ) |
36
- | [ Raw C string] ( #raw-c-string-literals ) | ` cr#"hello"# ` | <256 | All Unicode | ` N/A ` |
37
35
38
36
\* The number of ` # ` s on each side of the same literal must be equivalent.
39
37
@@ -330,107 +328,6 @@ b"\x52"; b"R"; br"R"; // R
330
328
b " \ \ x52" ; br " \x52" ; // \x52
331
329
```
332
330
333
- ### C string and raw C string literals
334
-
335
- #### C string literals
336
-
337
- > ** <sup >Lexer</sup >** \
338
- > C_STRING_LITERAL :\
339
- >   ;  ; ` c" ` (\
340
- >   ;  ;   ;  ; ~ \[ ` " ` ` \ ` _ IsolatedCR_ ] \
341
- >   ;  ;   ;  ; | BYTE_ESCAPE\
342
- >   ;  ;   ;  ; | UNICODE_ESCAPE\
343
- >   ;  ;   ;  ; | STRING_CONTINUE\
344
- >   ;  ; )<sup >\* </sup > ` " ` SUFFIX<sup >?</sup >
345
-
346
- A _ C string literal_ is a sequence of Unicode characters and _ escapes_ ,
347
- preceded by the characters ` U+0063 ` (` c ` ) and ` U+0022 ` (double-quote), and
348
- followed by the character ` U+0022 ` . If the character ` U+0022 ` is present within
349
- the literal, it must be _ escaped_ by a preceding ` U+005C ` (` \ ` ) character.
350
- Alternatively, a C string literal can be a _ raw C string literal_ , defined
351
- below. The type of a C string literal is [ ` &core::ffi::CStr ` ] [ CStr ] .
352
-
353
- [ CStr ] : ../core/ffi/struct.CStr.html
354
-
355
- C strings are implicitly terminated by byte ` 0x00 ` , so the C string literal
356
- ` c"" ` is equivalent to manually constructing a ` &CStr ` from the byte string
357
- literal ` b"\x00" ` . Other than the implicit terminator, byte ` 0x00 ` is not
358
- permitted within a C string.
359
-
360
- Some additional _ escapes_ are available in non-raw C string literals. An escape
361
- starts with a ` U+005C ` (` \ ` ) and continues with one of the following forms:
362
-
363
- * A _ byte escape_ escape starts with ` U+0078 ` (` x ` ) and is followed by exactly
364
- two _ hex digits_ . It denotes the byte equal to the provided hex value.
365
- * A _ 24-bit code point escape_ starts with ` U+0075 ` (` u ` ) and is followed
366
- by up to six _ hex digits_ surrounded by braces ` U+007B ` (` { ` ) and ` U+007D `
367
- (` } ` ). It denotes the Unicode code point equal to the provided hex value,
368
- encoded as UTF-8.
369
- * A _ whitespace escape_ is one of the characters ` U+006E ` (` n ` ), ` U+0072 `
370
- (` r ` ), or ` U+0074 ` (` t ` ), denoting the bytes values ` 0x0A ` (ASCII LF),
371
- ` 0x0D ` (ASCII CR) or ` 0x09 ` (ASCII HT) respectively.
372
- * The _ backslash escape_ is the character ` U+005C ` (` \ ` ) which must be
373
- escaped in order to denote its ASCII encoding ` 0x5C ` .
374
-
375
- The escape sequences ` \0 ` , ` \x00 ` , and ` \u{0000} ` are permitted within the token
376
- but will be rejected as invalid, as C strings may not contain byte ` 0x00 ` except
377
- as the implicit terminator.
378
-
379
- A C string represents bytes with no defined encoding, but a C string literal
380
- may contain Unicode characters above ` U+007F ` . Such characters will be replaced
381
- with the bytes of that character's UTF-8 representation.
382
-
383
- The following C string literals are equivalent:
384
-
385
- ``` rust
386
- c " æ" ; // LATIN SMALL LETTER AE (U+00E6)
387
- c " \ u{ 00E6} " ;
388
- c " \ x C3\ x A6" ;
389
- ```
390
-
391
- > ** Edition Differences** : C string literals are accepted in the 2021 edition or
392
- > later. In earlier additions the token ` c"" ` is lexed as ` c "" ` .
393
-
394
- #### Raw C string literals
395
-
396
- > ** <sup >Lexer</sup >** \
397
- > RAW_C_STRING_LITERAL :\
398
- >   ;  ; ` cr ` RAW_C_STRING_CONTENT SUFFIX<sup >?</sup >
399
- >
400
- > RAW_C_STRING_CONTENT :\
401
- >   ;  ;   ;  ; ` " ` ( ~ _ IsolatedCR_ )<sup >* (non-greedy)</sup > ` " ` \
402
- >   ;  ; | ` # ` RAW_C_STRING_CONTENT ` # `
403
-
404
- Raw C string literals do not process any escapes. They start with the
405
- character ` U+0063 ` (` c ` ), followed by ` U+0072 ` (` r ` ), followed by fewer than 256
406
- of the character ` U+0023 ` (` # ` ), and a ` U+0022 ` (double-quote) character. The
407
- _ raw C string body_ can contain any sequence of Unicode characters and is
408
- terminated only by another ` U+0022 ` (double-quote) character, followed by the
409
- same number of ` U+0023 ` (` # ` ) characters that preceded the opening ` U+0022 `
410
- (double-quote) character.
411
-
412
- All characters contained in the raw C string body represent themselves in UTF-8
413
- encoding. The characters ` U+0022 ` (double-quote) (except when followed by at
414
- least as many ` U+0023 ` (` # ` ) characters as were used to start the raw C string
415
- literal) or ` U+005C ` (` \ ` ) do not have any special meaning.
416
-
417
- > ** Edition Differences** : Raw C string literals are accepted in the 2021
418
- > edition or later. In earlier additions the token ` cr"" ` is lexed as ` cr "" ` ,
419
- > and ` cr#""# ` is lexed as ` cr #""# ` (which is non-grammatical).
420
-
421
- #### Examples for C string and raw C string literals
422
-
423
- ``` rust
424
- c " foo" ; cr " foo" ; // foo
425
- c " \ " foo\ "" ; cr #"" foo "" #; // "foo"
426
-
427
- c " foo #\ " # bar" ;
428
- cr ##" foo #" # bar " ##; // foo #" # bar
429
-
430
- c " \ x52 " ; c " R" ; cr " R" ; // R
431
- c " \ \ x52" ; cr " \ x52 " ; // \x52
432
- ```
433
-
434
331
### Number literals
435
332
436
333
A _ number literal_ is either an _ integer literal_ or a _ floating-point
@@ -731,17 +628,17 @@ them are referred to as "token trees" in [macros]. The three types of brackets
731
628
## Reserved prefixes
732
629
733
630
> ** <sup >Lexer 2021+</sup >** \
734
- > RESERVED_TOKEN_DOUBLE_QUOTE : ( IDENTIFIER_OR_KEYWORD <sub >_ Except ` b ` or ` c ` or ` r ` or ` br ` or ` cr ` _ </sub > | ` _ ` ) ` " ` \
631
+ > RESERVED_TOKEN_DOUBLE_QUOTE : ( IDENTIFIER_OR_KEYWORD <sub >_ Except ` b ` or ` r ` or ` br ` _ </sub > | ` _ ` ) ` " ` \
735
632
> RESERVED_TOKEN_SINGLE_QUOTE : ( IDENTIFIER_OR_KEYWORD <sub >_ Except ` b ` _ </sub > | ` _ ` ) ` ' ` \
736
- > RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD <sub >_ Except ` r ` or ` br ` or ` cr ` _ </sub > | ` _ ` ) ` # `
633
+ > RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD <sub >_ Except ` r ` or ` br ` _ </sub > | ` _ ` ) ` # `
737
634
738
635
Some lexical forms known as _ reserved prefixes_ are reserved for future use.
739
636
740
637
Source input which would otherwise be lexically interpreted as a non-raw identifier (or a keyword or ` _ ` ) which is immediately followed by a ` # ` , ` ' ` , or ` " ` character (without intervening whitespace) is identified as a reserved prefix.
741
638
742
639
Note that raw identifiers, raw string literals, and raw byte string literals may contain a ` # ` character but are not interpreted as containing a reserved prefix.
743
640
744
- Similarly the ` r ` , ` b ` , ` br ` , ` c ` , and ` cr ` prefixes used in raw string literals, byte literals, byte string literals, raw byte string literals, C string literals, and raw C string literals are not interpreted as reserved prefixes.
641
+ Similarly the ` r ` , ` b ` , and ` br ` prefixes used in raw string literals, byte literals, byte string literals, and raw byte string literals are not interpreted as reserved prefixes.
745
642
746
643
> ** Edition Differences** : Starting with the 2021 edition, reserved prefixes are reported as an error by the lexer (in particular, they cannot be passed to macros).
747
644
>
0 commit comments