-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffer.toString('utf8') appears to use wtf-8 #23280
Comments
Actually, straight up This technically might not be a bug, just very surprising behavior that needs better documentation. |
FWIW even the |
|
I would suggest using Example: new TextDecoder('utf-8', { fatal: true }).decode(Buffer.from([237,166,164]))
// throws exception |
Thanks for the suggestion. I guess this mostly closes the issue, but it might make sense to specify that behavior (inserting U+FFFD in case of decoding errors) in the documentation of And while I'm at it: the documentation explicitly links to the constant for the maximum string length, but does not specify what actually happens if the decoded data does not fit into a js string. |
I am marking this as good first issue to add a proper documentation for the described behavior. |
@BridgeAR I'd like to give this a go! Can I proceed? |
@rexagod please go for it! |
@BridgeAR Other than the documentation, would it make sense for node to use a, say, |
added documentation on evaluating legal code points, and the behavior that stems from it otherwise. Fixes: nodejs#23280
I believe this can be closed. The
Which can be verified ...
|
The byte sequence
237, 166, 164
is not valid utf8, since it encodes a surrogate code point, which is not a valid unicode scalar value. SoBuffer.from([237, 166, 164]).toString('utf8')
should error. But instead, it returns a string, effectively implementing wtf-8 rather than utf-8.Or does
Buffer.toString
simply not provide any validity guarantees at all, returning garbage strings if the buffer contains invalid input? In that case, please document this as expected behavior, since it makes the function completely useless for a bunch of use cases.node -v
:v10.11.0
uname -a
:Linux aljoscha-laptop 4.18.10-arch1-1-ARCH #1 SMP PREEMPT Wed Sep 26 09:48:22 UTC 2018 x86_64 GNU/Linux
See also rust-lang/rust#54845
edit: This also leaks into
JSON.parse
, which can accept garbage strings even though ECMA-404 (the json standard prescribed for JSON.parse as defined in ECMAScript) only allows valid utf8 input.The text was updated successfully, but these errors were encountered: