@@ -32,6 +32,8 @@ Literals are tokens used in [literal expressions].
32
32
| [ Byte] ( #byte-literals ) | ` b'H' ` | 0 | All ASCII | [ Quote] ( #quote-escapes ) & [ Byte] ( #byte-escapes ) |
33
33
| [ Byte string] ( #byte-string-literals ) | ` b"hello" ` | 0 | All ASCII | [ Quote] ( #quote-escapes ) & [ Byte] ( #byte-escapes ) |
34
34
| [ Raw byte string] ( #raw-byte-string-literals ) | ` br#"hello"# ` | <256 | All ASCII | ` N/A ` |
35
+ | [ C string] ( #c-string-literals ) | ` c"hello" ` | 0 | All Unicode | [ Quote] ( #quote-escapes ) & [ Byte] ( #byte-escapes ) & [ Unicode] ( #unicode-escapes ) |
36
+ | [ Raw C string] ( #raw-c-string-literals ) | ` cr#"hello"# ` | <256 | All Unicode | ` N/A ` |
35
37
36
38
\* The number of ` # ` s on each side of the same literal must be equivalent.
37
39
@@ -328,6 +330,95 @@ b"\x52"; b"R"; br"R"; // R
328
330
b " \ \ x52" ; br " \x52" ; // \x52
329
331
```
330
332
333
+ ### C string and raw C string literals
334
+
335
+ #### C string literals
336
+
337
+ > ** <sup >Lexer</sup >** \
338
+ > C_STRING_LITERAL :\
339
+ >   ;  ; ` c" ` (\
340
+ >   ;  ;   ;  ; ~ \[ ` " ` ` \ ` _ IsolatedCR_ ] \
341
+ >   ;  ;   ;  ; | BYTE_ESCAPE\
342
+ >   ;  ;   ;  ; | UNICODE_ESCAPE\
343
+ >   ;  ;   ;  ; | STRING_CONTINUE\
344
+ >   ;  ; )<sup >\* </sup > ` " ` SUFFIX<sup >?</sup >
345
+
346
+ A _ C string literal_ is a sequence of Unicode characters and _ escapes_ ,
347
+ preceded by the characters ` U+0063 ` (` c ` ) and ` U+0022 ` (double-quote), and
348
+ followed by the character ` U+0022 ` . If the character ` U+0022 ` is present within
349
+ the literal, it must be _ escaped_ by a preceding ` U+005C ` (` \ ` ) character.
350
+ Alternatively, a C string literal can be a _ raw C string literal_ , defined
351
+ below. The type of a C string literal is [ ` &core::ffi::CStr ` ] [ CStr ] .
352
+
353
+ [ CStr ] : ../core/ffi/struct.CStr.html
354
+
355
+ C strings are implicitly terminated by byte ` 0x00 ` , so the C string literal
356
+ ` c"" ` is equivalent to manually constructing a ` &CStr ` from the byte string
357
+ literal ` b"\x00" ` . Other than the implicit terminator, byte ` 0x00 ` is not
358
+ permitted within a C string.
359
+
360
+ Some additional _ escapes_ are available in non-raw C string literals. An escape
361
+ starts with a ` U+005C ` (` \ ` ) and continues with one of the following forms:
362
+
363
+ * A _ byte escape_ escape starts with ` U+0078 ` (` x ` ) and is followed by exactly
364
+ two _ hex digits_ . It denotes the byte equal to the provided hex value.
365
+ * A _ 24-bit code point escape_ starts with ` U+0075 ` (` u ` ) and is followed
366
+ by up to six _ hex digits_ surrounded by braces ` U+007B ` (` { ` ) and ` U+007D `
367
+ (` } ` ). It denotes the Unicode code point equal to the provided hex value,
368
+ encoded as UTF-8.
369
+ * A _ whitespace escape_ is one of the characters ` U+006E ` (` n ` ), ` U+0072 `
370
+ (` r ` ), or ` U+0074 ` (` t ` ), denoting the bytes values ` 0x0A ` (ASCII LF),
371
+ ` 0x0D ` (ASCII CR) or ` 0x09 ` (ASCII HT) respectively.
372
+ * The _ backslash escape_ is the character ` U+005C ` (` \ ` ) which must be
373
+ escaped in order to denote its ASCII encoding ` 0x5C ` .
374
+
375
+ The escape sequences ` \0 ` , ` \x00 ` , and ` \u{0000} ` are permitted within the token
376
+ but will be rejected as invalid, as C strings may not contain byte ` 0x00 ` except
377
+ as the implicit terminator.
378
+
379
+ > ** Edition Differences** : C string literals are accepted in the 2021 edition or
380
+ > later. In earlier additions the token ` c"" ` is lexed as ` c "" ` .
381
+
382
+ #### Raw C string literals
383
+
384
+ > ** <sup >Lexer</sup >** \
385
+ > RAW_C_STRING_LITERAL :\
386
+ >   ;  ; ` cr ` RAW_C_STRING_CONTENT SUFFIX<sup >?</sup >
387
+ >
388
+ > RAW_C_STRING_CONTENT :\
389
+ >   ;  ;   ;  ; ` " ` ( ~ _ IsolatedCR_ )<sup >* (non-greedy)</sup > ` " ` \
390
+ >   ;  ; | ` # ` RAW_C_STRING_CONTENT ` # `
391
+
392
+ Raw C string literals do not process any escapes. They start with the
393
+ character ` U+0063 ` (` c ` ), followed by ` U+0072 ` (` r ` ), followed by fewer than 256
394
+ of the character ` U+0023 ` (` # ` ), and a ` U+0022 ` (double-quote) character. The
395
+ _ raw C string body_ can contain any sequence of Unicode characters and is
396
+ terminated only by another ` U+0022 ` (double-quote) character, followed by the
397
+ same number of ` U+0023 ` (` # ` ) characters that preceded the opening ` U+0022 `
398
+ (double-quote) character.
399
+
400
+ All characters contained in the raw C string body represent themselves in UTF-8
401
+ encoding. The characters ` U+0022 ` (double-quote) (except when followed by at
402
+ least as many ` U+0023 ` (` # ` ) characters as were used to start the raw C string
403
+ literal) or ` U+005C ` (` \ ` ) do not have any special meaning.
404
+
405
+ > ** Edition Differences** : Raw C string literals are accepted in the 2021
406
+ > edition or later. In earlier additions the token ` cr"" ` is lexed as ` cr "" ` ,
407
+ > and ` cr#""# ` is lexed as ` cr #""# ` (which is non-grammatical).
408
+
409
+ #### Examples for C string and raw C string literals
410
+
411
+ ``` rust
412
+ c " foo" ; cr " foo" ; // foo
413
+ c " \ " foo\ "" ; cr #"" foo "" #; // "foo"
414
+
415
+ c " foo #\ " # bar" ;
416
+ cr ##" foo #" # bar " ##; // foo #" # bar
417
+
418
+ c " \ x52 " ; c " R" ; cr " R" ; // R
419
+ c " \ \ x52" ; cr " \ x52 " ; // \x52
420
+ ```
421
+
331
422
### Number literals
332
423
333
424
A _ number literal_ is either an _ integer literal_ or a _ floating-point
@@ -628,17 +719,17 @@ them are referred to as "token trees" in [macros]. The three types of brackets
628
719
## Reserved prefixes
629
720
630
721
> ** <sup >Lexer 2021+</sup >** \
631
- > RESERVED_TOKEN_DOUBLE_QUOTE : ( IDENTIFIER_OR_KEYWORD <sub >_ Except ` b ` or ` r ` or ` br ` _ </sub > | ` _ ` ) ` " ` \
722
+ > RESERVED_TOKEN_DOUBLE_QUOTE : ( IDENTIFIER_OR_KEYWORD <sub >_ Except ` b ` or ` c ` or ` r ` or ` br ` or ` cr ` _ </sub > | ` _ ` ) ` " ` \
632
723
> RESERVED_TOKEN_SINGLE_QUOTE : ( IDENTIFIER_OR_KEYWORD <sub >_ Except ` b ` _ </sub > | ` _ ` ) ` ' ` \
633
- > RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD <sub >_ Except ` r ` or ` br ` _ </sub > | ` _ ` ) ` # `
724
+ > RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD <sub >_ Except ` r ` or ` br ` or ` cr ` _ </sub > | ` _ ` ) ` # `
634
725
635
726
Some lexical forms known as _ reserved prefixes_ are reserved for future use.
636
727
637
728
Source input which would otherwise be lexically interpreted as a non-raw identifier (or a keyword or ` _ ` ) which is immediately followed by a ` # ` , ` ' ` , or ` " ` character (without intervening whitespace) is identified as a reserved prefix.
638
729
639
730
Note that raw identifiers, raw string literals, and raw byte string literals may contain a ` # ` character but are not interpreted as containing a reserved prefix.
640
731
641
- Similarly the ` r ` , ` b ` , and ` br ` prefixes used in raw string literals, byte literals, byte string literals, and raw byte string literals are not interpreted as reserved prefixes.
732
+ Similarly the ` r ` , ` b ` , ` br ` , ` c ` , and ` cr ` prefixes used in raw string literals, byte literals, byte string literals, raw byte string literals, C string literals, and raw C string literals are not interpreted as reserved prefixes.
642
733
643
734
> ** Edition Differences** : Starting with the 2021 edition, reserved prefixes are reported as an error by the lexer (in particular, they cannot be passed to macros).
644
735
>
0 commit comments