-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
export and document transcode #17323
Changes from all commits
8afd70d
7576748
aa371bc
66f9155
35e7b69
fc74630
bef4d19
c49099d
d10a1e8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -128,20 +128,39 @@ function cwstring(s::AbstractString) | |
end | ||
end | ||
|
||
# transcoding between data in UTF-8 and UTF-16 for Windows APIs | ||
# transcoding between data in UTF-8 and UTF-16 for Windows APIs, | ||
# and also UTF-32 for APIs using Cwchar_t on other platforms. | ||
|
||
""" | ||
Base.transcode(T,src::Vector{U}) | ||
transcode(T, src) | ||
|
||
Convert string data between Unicode encodings. `src` is either a | ||
`String` or a `Vector{UIntXX}` of UTF-XX code units, where | ||
`XX` is 8, 16, or 32. `T` indicates the encoding of the return value: | ||
`String` to return a (UTF-8 encoded) `String` or `UIntXX` | ||
to return a `Vector{UIntXX}` of UTF-`XX` data. (The alias `Cwchar_t` | ||
can also be used as the integer type, for converting `wchar_t*` strings | ||
used by external C libraries.) | ||
|
||
Transcodes unicode data `src` to a different encoding, where `U` and `T` are the integers | ||
denoting the input and output code units. Currently supported are UTF-8 and UTF-16, which | ||
are denoted by integers `UInt8` and `UInt16`, respectively. | ||
The `transcode` function succeeds as long as the input data can be | ||
reasonably represented in the target encoding; it always succeeds for | ||
conversions between UTF-XX encodings, even for invalid Unicode data. | ||
|
||
NULs are handled like any other character (i.e. the output will be NUL-terminated if and | ||
only if the `src` is). | ||
Only conversion to/from UTF-8 is currently supported. | ||
""" | ||
function transcode end | ||
transcode{T<:Union{UInt8,UInt16}}(::Type{T}, src::Vector{T}) = src | ||
transcode(::Type{Int32}, src::Vector{UInt32}) = reinterpret(Int32, src) | ||
|
||
transcode{T<:Union{UInt8,UInt16,UInt32,Int32}}(::Type{T}, src::Vector{T}) = src | ||
transcode{T<:Union{Int32,UInt32}}(::Type{T}, src::String) = T[T(c) for c in src] | ||
transcode{T<:Union{Int32,UInt32}}(::Type{T}, src::Vector{UInt8}) = transcode(T, String(src)) | ||
function transcode{S<:Union{Int32,UInt32}}(::Type{UInt8}, src::Vector{S}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the rationale for supporting signed 32-bit code units ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I suppose it wouldn't hurt to support the signed types for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I was arguing for not supporting |
||
buf = IOBuffer() | ||
for c in src; print(buf, Char(c)); end | ||
takebuf_array(buf) | ||
end | ||
transcode(::Type{String}, src::String) = src | ||
transcode(T, src::String) = transcode(T, src.data) | ||
transcode(::Type{String}, src) = String(transcode(UInt8, src)) | ||
|
||
function transcode(::Type{UInt16}, src::Vector{UInt8}) | ||
dst = UInt16[] | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -874,6 +874,7 @@ export | |
strip, | ||
strwidth, | ||
summary, | ||
transcode, | ||
ucfirst, | ||
unescape_string, | ||
uppercase, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs rst signature so genstdlib can fill it in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I was meaning to do that and forgot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.