Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify String::from_utf8_unchecked's invariants #98596

Closed
wants to merge 1 commit into from

Conversation

CAD97
Copy link
Contributor

@CAD97 CAD97 commented Jun 27, 2022

This is the same clarification as in #95895, and makes str::from_utf8_unchecked and String::from_utf8_unchecked agree again. Either String should be changed to match str (this PR), or str should be reverted to match String. (Or both are changed to a new, better description of what it means to violate the safety invariant.)

Strictly speaking, String's buffer being valid UTF-8 is a safety invariant, not a validity invariant. This means that String managing a non-UTF-8 buffer is not an AM violation / UB, but can result in safe API surface causing an AM violation / UB.

Currently, no String functionality, including Drop::drop, say they are valid on invalid UTF-8. As such, the only thing possible to do with an unsafe String is forget to drop it. Everything else is library UB.

Making valid UTF-8 a precondition of from_utf8_unchecked then is an API simplification. Additionally, this makes from_utf8(bytes).unwrap() a valid sanitizing implementation.

I opened a #t-lang/wg-unsafe-code-guidelines Zulip thread to discuss whether from_unchecked style functions should prefer documenting a safety precondition or a safety postcondition.

This is the same clarification as in b92cd1a.

Strictly speaking, String's buffer being valid UTF-8 is a *safety* invariant, not a *validity* invariant.
This means that String managing a non-UTF-8 buffer is not an AM violation / UB,
but can result in safe API surface causing an AM violation / UB.

Currently, *no* String functionality, including Drop::drop, say they are valid on invalid UTF-8.
As such, the only thing possible to do with an unsafe String is forget to drop it.
Everything else is library UB.

Making valid UTF-8 a precondition of from_utf8_unchecked then is an API simplification.
Additionally, this makes from_utf8(bytes).unwrap() a valid sanitizing implementation.
@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Jun 27, 2022
@rust-highfive
Copy link
Contributor

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@rust-highfive
Copy link
Contributor

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 27, 2022
@CAD97
Copy link
Contributor Author

CAD97 commented Jun 27, 2022

@rustbot label +T-libs-api -T-libs

@rustbot rustbot added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jun 27, 2022
@Mark-Simulacrum
Copy link
Member

I have a preference towards the new phrasing as well, but probably worth a short discussion at a T-libs-api meeting or similar just to confirm this is indeed what we want.

r? @joshtriplett

@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 24, 2022
@JohnCSimon
Copy link
Member

What is the status of this PR. No movement since June

@pitaj
Copy link
Contributor

pitaj commented May 5, 2023

@joshtriplett ping from triage, no movement for 7 months. What is the status of this PR? Did the aforementioned discussion ever take place?

@JohnCSimon
Copy link
Member

@CAD97
Ping from triage: I'm closing this due to inactivity, Please reopen when you are ready to continue with this.
Note: if you are going to continue please open the PR BEFORE you push to it, else you won't be able to reopen - this is a quirk of github.
Thanks for your contribution.

@rustbot label: +S-inactive

@JohnCSimon JohnCSimon closed this May 28, 2023
@rustbot rustbot added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label May 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants