-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix consistency with is_ascii_whitespace #82152
Conversation
r? @estebank (rust-highfive has picked a reviewer for you, use r? to override) |
Btw, there also in rust/compiler/rustc_lexer/src/lib.rs Lines 235 to 262 in d1206f9
slightly different definition of ascii whitespaces (6 of it, added 0x0b; omitting non-ascii part of it). |
Is |
cc @matklad on this |
"foo\
bar" I don't think we have a good motivation to extend this set with form feed ( |
I suggest to wait for matklad's opinion and then probably close this PR. |
I don’t think consistency with str methods is a good motivation here: character classes used in the lexer are deliberately different from those used by str methods. Lexer guarantees stability across Unicode versions, while str does not. What is inconsistent here is that we use different definition of white space in string tokens and in between tokens whitespace. One is a hard-coded list of “reasonable” whitespace characters, and another is standard’s defined Pattern_Whitespace. IIRC, the reason for that inconsistency is that, originally, both used simple whitespace, but then, during the implementation of Unicode identifiers, one was changed while the other was not, and Unicode whitespace wasn’t gated properly and leaked to stable. in practice, this probably doesn’t really matter, so I’d try to avoid changes here. |
Previously this method checked only 4 of 5 possible ascii whitespace byte variants (as defined in https://doc.rust-lang.org/std/primitive.char.html#method.is_ascii_whitespace).
Checked bytes:
before:
after: