Encoding on the Internet
The scripts discussed on the last page such as Greek, Hebrew, Arabic and Cyrillic, are all alphabetic or about the same size as the Roman alphabet. But for syllabary scripts or ideographic scripts, the repertoire of characters is can be larger than 256 characters. These scripts require an encoding scheme which can accommodate more characters.
As a result, 16-bit encodings of tens of thousands characters were developed for these scripts. This is also called "double byte" (2 x 8-bit) encoding. In practice, characters are organized in blocks of 192 characters.
Because many East Asian scripts incorporate Chinese characters, they are collectively known as "CJK" scripts, short for "Chinese-Japanese-Korean". The scripts are not identical, but all of them are the same order of magnitude in size.
To accommodate both English and the other scripts, many 16-bit encodings are structured as follows:
Interestingly, many East Asian encodings also incorporate other scripts such as the Cyrillic and Greek alphabet. Some browsers, especially on the Mac platform, use a Japanese font as the default for Cyrillic or Greek pages.
©Penn State University, 2000-2013.
This Web page maintained by Teaching and Learning with Technology, a unit of Information Technology Services. For questions or comments on this Web page, please contact Elizabeth J. Pyatt (email@example.com).
This site uses Unicode to display non-English characters. This site is best viewed in the most recent versions of your browser.
Unicode character names and hexadecimal entity codes are taken from the public Unicode Character Charts.
This publication is available in alternate media upon request.
Last Modified: Tuesday, 04-Jun-2013 12:41:29 EDT