Computing With Accents and Foreign Scripts

Encoding on the Internet

6: East Asian Languages and 16-Bit Encoding

Large Encodings for Non-Alphabets

The scripts discussed on the last page such as Greek, Hebrew, Arabic and Cyrillic, are all alphabetic or about the same size as the Roman alphabet. But for syllabary scripts or ideographic scripts, the repertoire of characters is can be larger than 256 characters. These scripts require an encoding scheme which can accommodate more characters.

As a result, 16-bit encodings of tens of thousands characters were developed for these scripts. This is also called "double byte" (2 x 8-bit) encoding. In practice, characters are organized in blocks of 192 characters.

Chinese Japanese and Korean (CJK)

Because many East Asian scripts incorporate Chinese characters, they are collectively known as "CJK" scripts, short for "Chinese-Japanese-Korean". The scripts are not identical, but all of them are the same order of magnitude in size.

Encoding Template

To accommodate both English and the other scripts, many 16-bit encodings are structured as follows:

Interestingly, many East Asian encodings also incorporate other scripts such as the Cyrillic and Greek alphabet. Some browsers, especially on the Mac platform, use a Japanese font as the default for Cyrillic or Greek pages.

