Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected lowercase(Char) result #7847

Closed
catawbasam opened this issue Aug 5, 2014 · 8 comments
Closed

unexpected lowercase(Char) result #7847

catawbasam opened this issue Aug 5, 2014 · 8 comments
Labels
system:windows Affects only Windows unicode Related to unicode characters and encodings

Comments

@catawbasam
Copy link
Contributor

Expected to see 'δ'--

julia> lowercase('Δ')
'Δ'
              _
  _       _ _(_)_     |  A fresh approach to technical computing
 (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
  _ _   _| |_  __ _   |  Type "help()" for help.
 | | | | | | |/ _` |  |
 | | |_| | | | (_| |  |  Version 0.3.0-rc1+284 (2014-07-30 21:54 UTC)
_/ |\__'_|_|_|\__'_|  |  Commit da17f92 (5 days old master)
__/                   |  x86_64-w64-mingw32
@catawbasam
Copy link
Contributor Author

Looks like it is Windows only:

   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.3.0-rc1+258 (2014-07-29 15:42 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 15f5aed* (6 days old master)
|__/                   |  x86_64-redhat-linux

julia> lowercase('Δ')
'δ'

@tkelman
Copy link
Contributor

tkelman commented Aug 5, 2014

msvcrt.dll's towlower doesn't speak Greek, apparently. Neither does msvcr120.dll.

According to http://msdn.microsoft.com/en-us/library/8h19t214.aspx, it only works for ascii.
I wonder whether utf8proc could give us upper/lowercase mappings?

@tkelman
Copy link
Contributor

tkelman commented Aug 5, 2014

I do get the right answer if I use _towlower_l though (I think because we set the C locale somewhere - http://stackoverflow.com/questions/15889784/why-does-the-towlower-function-not-convert-the-%D0%AF-to-a-lower-case-%D1%8F):

julia> convert(Char, ccall(:_towlower_l, Cwchar_t, (Cwchar_t,), 'Δ'))
'δ'

I don't think that exists in Windows XP.

@jiahao
Copy link
Member

jiahao commented Aug 5, 2014

I wonder whether utf8proc could give us upper/lowercase mappings?

utf8proc does not expose Unicode letter casings data except for a casefolding function (for case-insensitive string comparisons).

Case mappings cannot be defined uniquely without a locale. The canonical example is small 'i', whose upper-cased image is capital 'I' in English but capital 'İ' (with overdot) in Turkish.

Since utf8proc is locale-agnostic, it wouldn't be possible to implement this without also implementing all the ancillary code for locale handling. This problem is better suited for a library like libicu,.

@jiahao
Copy link
Member

jiahao commented Aug 5, 2014

In fact, this manifestly returns the wrong result for terminal sigma in Greek:

julia> lowercase("ΛΌΓΟΣ") #logos; λόγος
"λόγοσ"

This is a separate issue, though.

@JeffBezanson
Copy link
Member

The setlocale calls we make are

        setlocale(LC_ALL, ""); // set to user locale
        setlocale(LC_NUMERIC, "C"); // use locale-independent numeric formats

@vtjnash
Copy link
Member

vtjnash commented Aug 7, 2014

I think I succeeded in removing those calls for all of 2 days, then someone noticed that it broke printing of bignums.

@catawbasam
Copy link
Contributor Author

closed by PR #8233

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system:windows Affects only Windows unicode Related to unicode characters and encodings
Projects
None yet
Development

No branches or pull requests

5 participants