|
| 1 | +This file is Copyright (c) 2003, 2006 Lev Walkin <[email protected]>. All rights |
| 2 | +reserved. Redistribution and modifications are permitted subject to BSD license. |
| 3 | + |
| 4 | +Originally part of the asn1c source code, file TeletexString.c. -- Lorenz Bauer |
| 5 | + |
| 6 | +Here is a formal attempt at creating a mapping from TeletexString |
| 7 | +(T61String) of the latest ASN.1 standard (X.680:2002) into the Unicode |
| 8 | +character set. -- Lev Walkin <[email protected]> |
| 9 | + |
| 10 | +The first thing to keep in mind is that TeletexString (T61String) |
| 11 | +is defined in ASN.1, and is not really a T.61 string. |
| 12 | +The T.61 standard is withdrawn by ITU-T and is no longer an authoritative |
| 13 | +reference. See http://www.itu.int/rec/T-REC-T.61 |
| 14 | + |
| 15 | +The X.680 specifies TeletexString (T61String) as a combination of the |
| 16 | +character sets specified by the registration numbers listed in |
| 17 | +ISO International Register of Coded Character Sets to be used with |
| 18 | +Escape Sequences (ISO-2375): |
| 19 | +6, 87, 102, 103, 106, 107, 126, 144, 150, 153, 156, 164, 165, 168, |
| 20 | +plus SPACE and DELETE characters. |
| 21 | +In addition to that, the X.680 Table 6 NOTE 2 allows using register entries |
| 22 | +6 and 156 instead of 102 and 103. |
| 23 | + |
| 24 | +The ISO Register itself is available at http://www.itscj.ipsj.or.jp/ISO-IR/ |
| 25 | + |
| 26 | +#6 is ASCII. http://www.itscj.ipsj.or.jp/ISO-IR/006.pdf |
| 27 | + Escapes into: |
| 28 | + G0: ESC 2/8 4/2 ("(B") |
| 29 | + G1: ESC 2/9 4/2 (")B") |
| 30 | + The range is [0x21 .. 0x7e]. Conversion into Unicode |
| 31 | + is simple, because it has one-to-one correspondence. |
| 32 | +#87 is a "Japanese Graphic Character Set for Information Interchange". |
| 33 | + Is a multiple-byte set of 6877 characters. |
| 34 | + The character set is JIS X 0208-1983 (originally JIS C 6226-1983). |
| 35 | + Escapes into: |
| 36 | + G0: ESC 2/4 4/2 ("$B") |
| 37 | + G1: ESC 2/4 2/9 4/2 ("$)B") |
| 38 | + G2: ESC 2/4 2/10 4/2 ("$*B") |
| 39 | + G3: ESC 2/4 2/11 4/2 ("$+B") |
| 40 | +#102 is "Teletex Primary Set of Graphic Characters" and is almost ASCII. |
| 41 | + Escapes into: |
| 42 | + G0: ESC 2/8 7/5 ("(u") |
| 43 | + G1: ESC 2/9 7/5 (")u") |
| 44 | + G2: ESC 2/10 7/5 ("*u") |
| 45 | + G3: ESC 2/11 7/5 ("+u") |
| 46 | + It is almost identical to ASCII, except for ASCII position for '$' |
| 47 | + (DOLLAR SIGN) is filled with '¤' (CURRENCY SIGN), which is U+00A4. |
| 48 | + Also, ASCII positions for '`', '\', '^', '{', '}', '~' are marked |
| 49 | + as "should not be used". |
| 50 | +#103 is a supplementary set of characters used in combination with #102. |
| 51 | + Escapes into: |
| 52 | + G0: ESC 2/8 7/6 ("(v") |
| 53 | + G1: ESC 2/9 7/6 (")v") |
| 54 | + G2: ESC 2/10 7/6 ("*v") |
| 55 | + G3: ESC 2/11 7/6 ("+v") |
| 56 | + Some characters in that character set are combining characters, |
| 57 | + which can only be restrictively used with certain basic Latin letters. |
| 58 | + It can be thought of as a subset of #156 with the exception of 4/12 |
| 59 | + which is UNDERLINE in #103 and absent in #156. |
| 60 | +#106 is a primary set of control functions, used in combination with #107. |
| 61 | + Escapes into: |
| 62 | + C0: ESC 2/1 4/5 ("!E") |
| 63 | + This set is so short I can list it here: |
| 64 | + 0x08 BS BACKSPACE -- same as Unicode |
| 65 | + 0x0a LF LINE FEED -- same as Unicode |
| 66 | + 0x0c FF FORM FEED -- same as Unicode |
| 67 | + 0x0d CR CARRIAGE RETURN -- same as Unicode |
| 68 | + 0x0e LS1 LOCKING SHIFT ONE |
| 69 | + 0x0f LS0 LOCKING SHIFT ZERO |
| 70 | + 0x19 SS2 SINGLE SHIFT TWO |
| 71 | + 0x1a SUB SUBSTITUTE CHARACTER |
| 72 | + 0x1b ESC ESCAPE -- same as Unicode |
| 73 | + 0x1d SS3 SINGLE SHIFT THREE |
| 74 | + The LS1 and LS0 are two magical functions which, respectively, invoke |
| 75 | + the currently designated G1 or G0 set into positions 2/1 to 7/14 |
| 76 | + The SS2 and SS3, respectively, invoke one character of the |
| 77 | + currently designated set G2 and G3. |
| 78 | + The SUB is wholly equivalent to U+001a (SUBSTITUTE) |
| 79 | +#107 is a supplementary set of control functions, used with #106. |
| 80 | + Escapes into: |
| 81 | + C1: ESC 2/2 4/8 ('"H') |
| 82 | + This set contains three special control codes: |
| 83 | + 0x8b PLD PARTIAL LINE DOWN -- similar to <SUB> |
| 84 | + 0x8c PLU PARTIAL LINE UP -- sumilar to <SUP> |
| 85 | + 0x9b CSI CONTROL SEQUENCE INTRODUCER |
| 86 | + This set is so out of world we can probably safely ignore it. |
| 87 | +#126 is a "Right-hand Part of the Latin/Greek Alphabet". |
| 88 | + Comprises of 90 characters, including accented letters. |
| 89 | + Escapes into: |
| 90 | + G1: ESC 2/13 4/6 ("-F") |
| 91 | + G2: ESC 2/14 4/6 (".F") |
| 92 | + G3: ESC 2/15 4/6 ("/F") |
| 93 | + Note: This Registration is a subset of ISO-IR 227. |
| 94 | +#144 is a "Cyrillic part of the Latin/Cyrillic Alphabet". |
| 95 | + Comprises of 95 characters. |
| 96 | + Escapes into: |
| 97 | + G1: ESC 2/13 4/12 ("-L") |
| 98 | + G2: ESC 2/14 4/12 (".L") |
| 99 | + G3: ESC 2/15 4/12 ("/L") |
| 100 | +#150 is a "Greek Primary Set of Graphic Characters". |
| 101 | + Comprises of 94 characters. |
| 102 | + Escapes into: |
| 103 | + G0: ESC 2/8 2/1 4/0 ("(!@") |
| 104 | + G1: ESC 2/9 2/1 4/0 (")!@") |
| 105 | + G2: ESC 2/10 2/1 4/0 ("*!@") |
| 106 | + G3: ESC 2/11 2/1 4/0 ("+!@") |
| 107 | +#153 is a "Basic Cyrillic Character Set for 8-bit codes". |
| 108 | + Comprises of 68 characters. |
| 109 | + Escapes into: |
| 110 | + G1: ESC 2/13 4/15 ("-O") |
| 111 | + G2: ESC 2/14 4/15 (".O") |
| 112 | + G3: ESC 2/15 4/15 ("/O") |
| 113 | +#156 is a "Supplementary Set of ISO/IEC 6937:1992" for use with #6 |
| 114 | + Comprises of 87 characters. |
| 115 | + Escapes into: |
| 116 | + G1: ESC 2/13 5/2 ("-R") |
| 117 | + G2: ESC 2/14 5/2 (".R") |
| 118 | + G3: ESC 2/15 5/2 ("/R") |
| 119 | +#164 is a "Hebrew Supplementary Set of Graphic Characters" |
| 120 | + Comprises of 27 characters. |
| 121 | + Escapes into: |
| 122 | + G1: ESC 2/13 5/3 ("-S") |
| 123 | + G2: ESC 2/14 5/3 (".S") |
| 124 | + G3: ESC 2/15 5/3 ("/S") |
| 125 | +#165 is a set of "Codes of the Chinese graphic character set" |
| 126 | + Is a multiple-byte set of 8446 characters. |
| 127 | + Escapes into: |
| 128 | + G0: ESC 2/4 2/8 4/5 ("$(E") |
| 129 | + G1: ESC 2/4 2/9 4/5 ("$)E") |
| 130 | + G2: ESC 2/4 2/10 4/5 ("$*E") |
| 131 | + G3: ESC 2/4 2/11 4/5 ("$+E") |
| 132 | +#168 is a "Japanese Graphic Character Set for Information Interchange" |
| 133 | + A multiple-byte set of 6879 characters updated from #87. |
| 134 | + Escapes into: |
| 135 | + G0: ESC 2/6 4/0 ESC 2/4 4/2 ("&@" "$B") |
| 136 | + G1: ESC 2/6 4/0 ESC 2/4 2/9 4/2 ("&@" "$)B") |
| 137 | + G2: ESC 2/6 4/0 ESC 2/4 2/10 4/2 ("&@" "$*B") |
| 138 | + G3: ESC 2/6 4/0 ESC 2/4 2/11 4/2 ("&@" "$+B") |
| 139 | + |
| 140 | +The different registers reside at the following byte values: |
| 141 | +- C0: 0x00 - 0x1f |
| 142 | +- G0: 0x20 - 0x7f |
| 143 | +- C1: 0x80 - 0x9f |
| 144 | +- G2: 0xa0 - 0xff |
| 145 | +- G2 and G3: ??? |
0 commit comments