-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unicode whitespace not recognized by lstrip() #27211
Comments
What do other languages do here? Note that the other chars ('\t','\n','\v','\f','\r') are control characters. |
Ruby and Crystal leave it, Python and Rust strip it |
Stripping unicode whitespace by default seems reasonable to me. |
Related idea: Define lstrip(f, s::AbstractString) = # ...
lstrip(s::AbstractString) = lstrip(c->isspace(c) || c in _default_delims, s) It seems like this would allow us to more concisely express what should be skipped, rather than adding all Unicode space characters to |
This is essentially equivalent to overriding |
Yeah, but you don't need to enumerate all Unicode whitespace chars in |
The existing
|
In summary, our choices are:
Given that it is a fairly minor function, my vague preference is option 4. |
|
A big 👎 to 4, 👍 to 7. |
Consider:
Note that
\U02009
is aUnicode Character 'THIN SPACE'
character.I don't know much about string processing or unicode, but it seems to me that
lstrip()
is doing something that's perhaps too naive:I realize you can override
chars
but I would suggest that we expand the default character set for string trimming to include unicode separators (ref https://www.fileformat.info/info/unicode/category/Zs/list.htm).The text was updated successfully, but these errors were encountered: