-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify String::from_utf8_unchecked's invariants #98596
Conversation
This is the same clarification as in b92cd1a. Strictly speaking, String's buffer being valid UTF-8 is a *safety* invariant, not a *validity* invariant. This means that String managing a non-UTF-8 buffer is not an AM violation / UB, but can result in safe API surface causing an AM violation / UB. Currently, *no* String functionality, including Drop::drop, say they are valid on invalid UTF-8. As such, the only thing possible to do with an unsafe String is forget to drop it. Everything else is library UB. Making valid UTF-8 a precondition of from_utf8_unchecked then is an API simplification. Additionally, this makes from_utf8(bytes).unwrap() a valid sanitizing implementation.
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
(rust-highfive has picked a reviewer for you, use r? to override) |
@rustbot label +T-libs-api -T-libs |
I have a preference towards the new phrasing as well, but probably worth a short discussion at a T-libs-api meeting or similar just to confirm this is indeed what we want. |
What is the status of this PR. No movement since June |
@joshtriplett ping from triage, no movement for 7 months. What is the status of this PR? Did the aforementioned discussion ever take place? |
@CAD97 @rustbot label: +S-inactive |
This is the same clarification as in #95895, and makes
str::from_utf8_unchecked
andString::from_utf8_unchecked
agree again. EitherString
should be changed to matchstr
(this PR), orstr
should be reverted to matchString
. (Or both are changed to a new, better description of what it means to violate the safety invariant.)Strictly speaking, String's buffer being valid UTF-8 is a safety invariant, not a validity invariant. This means that String managing a non-UTF-8 buffer is not an AM violation / UB, but can result in safe API surface causing an AM violation / UB.
Currently, no String functionality, including Drop::drop, say they are valid on invalid UTF-8. As such, the only thing possible to do with an unsafe String is forget to drop it. Everything else is library UB.
Making valid UTF-8 a precondition of from_utf8_unchecked then is an API simplification. Additionally, this makes
from_utf8(bytes).unwrap()
a valid sanitizing implementation.I opened a #t-lang/wg-unsafe-code-guidelines Zulip thread to discuss whether
from_unchecked
style functions should prefer documenting a safety precondition or a safety postcondition.