File tree 4 files changed +9
-8
lines changed
4 files changed +9
-8
lines changed Original file line number Diff line number Diff line change @@ -18,7 +18,8 @@ representable in a given `AbstractChar` type.
18
18
Internally, an `AbstractChar` type may use a variety of encodings. Conversion
19
19
to `UInt32` will not reveal this encoding because it always returns the
20
20
Unicode value of the character. (Typically, the raw encoding can be obtained
21
- via [`reinterpret`](@ref).)
21
+ via [`reinterpret`](@ref).) Character I/O uses UTF-8 by default for all
22
+ character types, regardless of their internal encoding.
22
23
"""
23
24
AbstractChar
24
25
@@ -148,8 +149,7 @@ hash(x::Char, h::UInt) =
148
149
# fallbacks:
149
150
isless (x:: AbstractChar , y:: AbstractChar ) = isless (Char (x), Char (y))
150
151
== (x:: AbstractChar , y:: AbstractChar ) = Char (x) == Char (y)
151
- hash (x:: AbstractChar , h:: UInt ) =
152
- hash_uint64 (((UInt32 (x) + UInt64 (0xd060fad0 )) << 32 ) ⊻ UInt64 (h))
152
+ hash (x:: AbstractChar , h:: UInt ) = hash (Char (x), h)
153
153
widen (:: Type{T} ) where {T<: AbstractChar } = T
154
154
155
155
- (x:: AbstractChar , y:: AbstractChar ) = Int (x) - Int (y)
Original file line number Diff line number Diff line change @@ -14,8 +14,8 @@ about strings:
14
14
* String indexing is done in terms of these code units:
15
15
* Characters are extracted by `s[i]` with a valid string index `i`
16
16
* Each `AbstractChar` in a string is encoded by one or more code units
17
- * Only the index of the first code unit of a `AbstractChar` is a valid index
18
- * The encoding of a `AbstractChar` is independent of what precedes or follows it
17
+ * Only the index of the first code unit of an `AbstractChar` is a valid index
18
+ * The encoding of an `AbstractChar` is independent of what precedes or follows it
19
19
* String encodings are [self-synchronizing] – i.e. `isvalid(s, i)` is O(1)
20
20
21
21
[self-synchronizing]: https://en.wikipedia.org/wiki/Self-synchronizing_code
Original file line number Diff line number Diff line change @@ -410,7 +410,7 @@ If `count` is provided, replace at most `count` occurrences.
410
410
or a regular expression.
411
411
If `r` is a function, each occurrence is replaced with `r(s)`
412
412
where `s` is the matched substring (when `pat`is a `Regex` or `AbstractString`) or
413
- character (when `pat` is a `AbstractChar` or a collection of `AbstractChar`).
413
+ character (when `pat` is an `AbstractChar` or a collection of `AbstractChar`).
414
414
If `pat` is a regular expression and `r` is a `SubstitutionString`, then capture group
415
415
references in `r` are replaced with the corresponding matched text.
416
416
To remove instances of `pat` from `string`, set `r` to the empty `String` (`""`).
Original file line number Diff line number Diff line change @@ -28,8 +28,9 @@ There are a few noteworthy high-level features about Julia's strings:
28
28
additional ` AbstractString ` subtypes (e.g. for other encodings). If you define a function expecting
29
29
a string argument, you should declare the type as ` AbstractString ` in order to accept any string
30
30
type.
31
- * Like C and Java, but unlike most dynamic languages, Julia has a first-class type representing
32
- a single character, called ` AbstractChar ` . This is just a special kind of 32-bit primitive type whose numeric value represents a Unicode code point.
31
+ * Like C and Java, but unlike most dynamic languages, Julia has a first-class type for representing
32
+ a single character, called ` AbstractChar ` . The built-in ` Char ` subtype of ` AbstractChar `
33
+ is a 32-bit primitive type that can represent any Unicode character.
33
34
* As in Java, strings are immutable: the value of an ` AbstractString ` object cannot be changed.
34
35
To construct a different string value, you construct a new string from parts of other strings.
35
36
* Conceptually, a string is a * partial function* from indices to characters: for some index values,
You can’t perform that action at this time.
0 commit comments