-
Notifications
You must be signed in to change notification settings - Fork 557
The ö, ü and ç characters sometimes appear as � #495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/12/20211202-3-6.pdf
https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/12/20211202-4-9.pdf https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/12/20211202-4-12.pdf |
https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/12/20211203-3-12.pdf
https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/12/20211203-3-17.pdf
https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/12/20211203-3-18.pdf https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/12/20211203-4-8.pdf
https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/12/20211203-4-11.pdf https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/12/20211203-4-13.pdf |
https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/12/20211204-4-7.pdf |
https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/12/20211205-3-5.pdf
|
https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/12/20211206-4-13.pdf
https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/12/20211206-4-17.pdf
https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/12/20211206-4-18.pdf |
@k00ni It is encoding related issue. It contains 2 issues inside. FirstFirst one related to encoding chars which is not exists in ToUnicode CMap for Identity-H fonts. Seems that those chars must be splitted to 1-byte chars and used as unicode entity codes.
SecondSecond is related to TrueType fonts and I don't understan how to decoding must be implemented in right way.
|
I'm adding these tests in case it improves in the future. @likemusic explained how this problem got there in smalot#495
Add one more test for font-fallback. This addition also resolves smalot#495. Catches situations where a null byte \x00 may not be found by preg_match in a unicode context. Null bytes in the text string usually means that a CIDMap encoded string has been passed through as UTF-8 bytes without being translated by any matching CIDMap pairs.
These two characters are Turkish characters. However, they sometimes appear neatly in the text, and sometimes they appear as �.
Related file:https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/11/20211130-3-3.pdf
Related file:https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/11/20211130-4-8.pdf
Related file:https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/11/20211130-4-9.pdf
Related file:https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/11/20211130-4-15.pdf
Related file:https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/11/20211130-4-17.pdf
Related file:https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/11/20211130-4-20.pdf
Related file:https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/11/20211130-4-22.pdf
The text was updated successfully, but these errors were encountered: