-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue for UnicodeVersion and UNICODE_VERSION #49726
Comments
If |
Why? |
This probably should not be stabilised as-is, but we definitely could have a stable type that can be compared against… something? |
@BurntSushi It's possible to write code that depends on a specific value of the constant (even if it is unlikely). We have no automated way of enforcing that as long as we only raise the Unicode version, nothing will fail to build, unlike other semver things. |
@Ericson2314 It's also possible to write code that depends on a specific version of Unicode, regardless of whether there is a constant or not. But we don't let that stop us from updating Unicode tables. So I'm not sure I buy your reasoning. |
Sorry I buried it in the second sentence but I meant build-time failures in particular. Most compatibility rules needed to prevent build-time failure are readily enforceable, say by https://github.com/rust-lang-nursery/rust-semverver. This wouldn't be. |
@Ericson2314 "doctor, it hurts when I do this"
|
I went back and https://github.com/arcnmx/rust-rfcs/blob/master/text/1105-api-evolution.md does talk about some a few things that technically break comparability but are lumped under minor changes because the risk is so slim. This could be one of them too. That RFC doesn't mention constants at all but probably should be amended to do so. |
Agreed, and I'd even go a step further: If I wrote code that only compiles if this value never changes, then I probably did so because I want my code to break when the value changes. Kinda sorta like |
I have tried to explain this issue in different conversations a few times, but haven't seen it in written form anywhere so far, so here's my attempt to explain it... Many functions under The main property of this data is that it evolves, steadily. Many systems have tried freezing their Unicode version, all facing major problems eventually. Rust has a good model of freezing only the unchangeable parts of Unicode, and keeping the evolving parts alive. As an example, Unicode 11 is going to have this addition in
Therefore, right now we have: assert_eq!('\u{0560}'.is_alphabetic(), false); And in a couple of month we will have: assert_eq!('\u{0560}'.is_alphabetic(), true); From an API design perspective, this is generally bad; because the meaning of the function is changing. Or, is it? Well, it depends on what we expect from this function, and in one view it could be "whether this character is an alphabetic character by the currently active/known/etc Unicode version?" Now, since the version of the data used here is not a "choice" made by the user of the library (as part of the build system), there's no way to set the right configuration to get these assertions right. Therefore, if they have any automated tests around this data, they can break because of changes made inside Rust.
When working with less-digitally-developed languages/writing-systems, it's not just the tests (and Unassigned characters) that become flaky, but even the main logic in the code can be affected by this data changes. Looking at the problem again, we don't need this value because the API is unstable; we need this value because of the "lack of choice" for the user to control the Unicode version. So, unless we decide to make these functions available for all versions of Unicode (>= some x.y.z), or not provide them at all (too late for that, already), we have to expose this pre-made choice in a programmable way. I'll write another comment soon explaining my findings and the progress on refining the type item mentioned here. |
(back after a long delay...) So, about the type mentioned above: rust/src/libcore/unicode/version.rs Lines 11 to 28 in d26f9e4
Since Unicode 11.0.0, published on June 2018, the text of The Unicode Standard, under Section "3.1 Versions of the Unicode Standard" (page 75, https://www.unicode.org/versions/Unicode11.0.0/ch03.pdf#page=4) has a new paragraph with more details on the numbers:
(Here's the Consensus and Action Items from the meeting: https://www.unicode.org/L2/L2017/17222.htm#152-C3) Since the version numbers are limited to one (unsigned) byte, we can do some improvements in our implementation, before setting it in stone. My proposal would be to update all three That's unless we decide that we want to use this type ( I believe that's not the case, hence making the suggestion above. Having a clear definition (in The Unicode Standard), I would be also fine if we decide to make the type instantiable.
I think that's all I have on the matter. What do you think? |
FWIW I’m still not very convinced that this is worth having a dedicated |
Right. That was based on the PR review in #42998. I'm neutral on this detail, as both types work for the purpose I described above. |
Are there any other concerns? What would be the next step here? |
I would be very supportive of seeing this stabilized. Rust's standard library has the unfortunate duty of having to ship Unicode tables for stuff like case conversions. If it's going to do that, then software should at least be able to know what version of Unicode is being used. In fact, this seems like a clear gap right now. |
I've opened a PR #71020 which changes UNICODE_VERSION to a tuple as there is no apparent need for a dedicated struct. |
…imonSapin Stabilize UNICODE_VERSION (feature unicode_version) Tracking issue: rust-lang#49726 r? @sfackler rust-lang#71020 changed the definition of `UNICODE_VERSION` just yesterday from a struct to a tuple. Maybe you want to wait some more before stabilizing this constant, on the other hand this is a very small and simple addition. CC @behnam @SimonSapin @Serentty
The stabilization PR (#71068) has been merged, so this can be closed, right? |
Yes. |
std::char
contains the following items:The value of the constant can change as new versions of the standard library are updated to new versions of Unicode.
They have been unstable under the
unicode
feature with #27783 as the designated tracking issue since they’ve been added in #18002 and #42998. We should decide whether to stabilize them (possibly with changes) or deprecate and remove them.I think it is important to have this information part of the standard library documentation somehow. I don’t know how useful it is to access this information programmatically.
CC @rust-lang/libs, @behnam
The text was updated successfully, but these errors were encountered: