Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct code point format in Base/Char/show function #33291

Merged
merged 6 commits into from
Sep 18, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion base/char.jl
Original file line number Diff line number Diff line change
Expand Up @@ -304,7 +304,7 @@ function show(io::IO, ::MIME"text/plain", c::T) where {T<:AbstractChar}
else
u = codepoint(c)
end
h = string(u, base = 16, pad = u ≤ 0xffff ? 4 : 6)
h = uppercase(string(u, base = 16, pad = 4))
print(io, (isascii(c) ? "ASCII/" : ""), "Unicode U+", h)
else
print(io, ": Malformed UTF-8")
Expand Down
2 changes: 1 addition & 1 deletion base/io.jl
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ Read the entirety of `io`, as a `String`.
julia> io = IOBuffer("JuliaLang is a GitHub organization");

julia> read(io, Char)
'J': ASCII/Unicode U+004a (category Lu: Letter, uppercase)
'J': ASCII/Unicode U+004A (category Lu: Letter, uppercase)

julia> io = IOBuffer("JuliaLang is a GitHub organization");

Expand Down
6 changes: 3 additions & 3 deletions base/iostream.jl
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ julia> io = IOBuffer("JuliaLang is a GitHub organization.");
julia> seek(io, 5);

julia> read(io, Char)
'L': ASCII/Unicode U+004c (category Lu: Letter, uppercase)
'L': ASCII/Unicode U+004C (category Lu: Letter, uppercase)
```
"""
function seek(s::IOStream, n::Integer)
Expand All @@ -122,12 +122,12 @@ julia> io = IOBuffer("JuliaLang is a GitHub organization.");
julia> seek(io, 5);

julia> read(io, Char)
'L': ASCII/Unicode U+004c (category Lu: Letter, uppercase)
'L': ASCII/Unicode U+004C (category Lu: Letter, uppercase)

julia> seekstart(io);

julia> read(io, Char)
'J': ASCII/Unicode U+004a (category Lu: Letter, uppercase)
'J': ASCII/Unicode U+004A (category Lu: Letter, uppercase)
```
"""
seekstart(s::IO) = seek(s,0)
Expand Down
2 changes: 1 addition & 1 deletion base/strings/basic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ julia> isvalid(str, 1)
true

julia> str[1]
'α': Unicode U+03b1 (category Ll: Letter, lowercase)
'α': Unicode U+03B1 (category Ll: Letter, lowercase)

julia> isvalid(str, 2)
false
Expand Down
18 changes: 9 additions & 9 deletions doc/src/manual/strings.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,8 @@ julia> isvalid(Char, 0x110000)
false
```

As of this writing, the valid Unicode code points are `U+00` through `U+d7ff` and `U+e000` through
`U+10ffff`. These have not all been assigned intelligible meanings yet, nor are they necessarily
As of this writing, the valid Unicode code points are `U+0000` through `U+D7FF` and `U+E000` through
`U+10FFFF`. These have not all been assigned intelligible meanings yet, nor are they necessarily
interpretable by applications, but all of these values are considered to be valid Unicode characters.

You can input any Unicode character in single quotes using `\u` followed by up to four hexadecimal
Expand All @@ -107,7 +107,7 @@ julia> '\u2200'
'∀': Unicode U+2200 (category Sm: Symbol, math)

julia> '\U10ffff'
'\U10ffff': Unicode U+10ffff (category Cn: Other, not assigned)
'\U10ffff': Unicode U+10FFFF (category Cn: Other, not assigned)
```

Julia uses your system's locale and language settings to determine which characters can be printed
Expand Down Expand Up @@ -173,10 +173,10 @@ julia> str[1]
'H': ASCII/Unicode U+0048 (category Lu: Letter, uppercase)

julia> str[6]
',': ASCII/Unicode U+002c (category Po: Punctuation, other)
',': ASCII/Unicode U+002C (category Po: Punctuation, other)

julia> str[end]
'\n': ASCII/Unicode U+000a (category Cc: Other, control)
'\n': ASCII/Unicode U+000A (category Cc: Other, control)
```

Many Julia objects, including strings, can be indexed with integers. The index of the first
Expand All @@ -192,7 +192,7 @@ a normal value:

```jldoctest helloworldstring
julia> str[end-1]
'.': ASCII/Unicode U+002e (category Po: Punctuation, other)
'.': ASCII/Unicode U+002E (category Po: Punctuation, other)

julia> str[end÷2]
' ': ASCII/Unicode U+0020 (category Zs: Separator, space)
Expand Down Expand Up @@ -223,7 +223,7 @@ Notice that the expressions `str[k]` and `str[k:k]` do not give the same result:

```jldoctest helloworldstring
julia> str[6]
',': ASCII/Unicode U+002c (category Po: Punctuation, other)
',': ASCII/Unicode U+002C (category Po: Punctuation, other)

julia> str[6:6]
","
Expand Down Expand Up @@ -416,7 +416,7 @@ julia> foreach(display, s)
'\xc0\xa0': [overlong] ASCII/Unicode U+0020 (category Zs: Separator, space)
'\xe2\x88': Malformed UTF-8 (category Ma: Malformed, bad data)
'\xe2': Malformed UTF-8 (category Ma: Malformed, bad data)
'|': ASCII/Unicode U+007c (category Sm: Symbol, math)
'|': ASCII/Unicode U+007C (category Sm: Symbol, math)

julia> isvalid.(collect(s))
4-element BitArray{1}:
Expand All @@ -429,7 +429,7 @@ julia> s2 = "\xf7\xbf\xbf\xbf"
"\U1fffff"

julia> foreach(display, s2)
'\U1fffff': Unicode U+1fffff (category In: Invalid, too high)
'\U1fffff': Unicode U+1FFFFF (category In: Invalid, too high)
```

We can see that the first two code units in the string `s` form an overlong encoding of
Expand Down
9 changes: 9 additions & 0 deletions test/char.jl
Original file line number Diff line number Diff line change
Expand Up @@ -290,3 +290,12 @@ end
@testset "broadcasting of Char" begin
@test identity.('a') == 'a'
end

@testset "code point format of U+ syntax (PR 33291)" begin
@test repr("text/plain", '\n') == "'\\n': ASCII/Unicode U+000A (category Cc: Other, control)"
@test repr("text/plain", '/') == "'/': ASCII/Unicode U+002F (category Po: Punctuation, other)"
@test repr("text/plain", '\u10e') == "'Ď': Unicode U+010E (category Lu: Letter, uppercase)"
@test repr("text/plain", '\u3a2c') == "'㨬': Unicode U+3A2C (category Lo: Letter, other)"
@test repr("text/plain", '\U001f428') == "'🐨': Unicode U+1F428 (category So: Symbol, other)"
@test repr("text/plain", '\U010f321') == "'\\U10f321': Unicode U+10F321 (category Co: Other, private use)"
end