Teaching and Learning with Technology

Computing With Accents and Foreign Scripts

Skip Menu

Encoding on the Internet

5: Non-Roman Scripts

Previous Page | Next Page

The Problem

Although 256 characters can support most Western European languages, it is not enough to handle non-Roman characters or languages with non-standard Roman characters. Therefore, other 8-bit encodings were developed for languages outside Western Europe.

Top of Page

Template

To accommodate both English and the other script, many 8-bit encodings are structured as follows:

Structure of Non-English Encodings
Script Encoding #0-127 #128-255
Arabic ISO-8859-6* (rarely used)
ASCII
Arabic
Greek ISO-8859-7*
ASCII
Greek
Hebrew ISO-8859-8*
ASCII
Hebrew

*External links to Wikipedia

On the Internet, if you switch the encoding View of your browser (View » Character Set/Encoding) for an English site, in most cases, you will still see English because the encoding supports it.

Behavior of Encoded Fonts

Because non-Roman encodings include ASCII, if you switch to a properly encoded font in word-processor font and begin to type, you will see English characters. It is not until you switch your keyboard, that the non-Roman letters appear.

Top of Page

Parallel Standards

"Windows" Encodings vs. ISO-8859-x

For many scripts, there is a competing Windows encoding standard and a non-Windows standard, typically one registered at the ISO as an ISO-8859-x set. For instance Hebrew Web pages can be encoded as either ISO-8859-8 ("Visual Hebrew") or as Windows-1255.

 

Variant Encodings by Script
Script ISO/Other Windows Encoding
Arabic ISO-8859-6 Windows-1256
Greek ISO-8859-7 ("ELOT") Windows-1253
Hebrew ISO-8859-8 ("Visual Hebrew") Windows-1255
Russian/Cyrillic KOI-8 Windows-1251
Central Europe ISO-8859-2 ("Latin 2") Windows-1250

If you develop in FrontPage for Windows, your Web page (even English) will be automatically encoded in the Windows Standard unless you specify otherwise (sometimes you cannot).

Top of Page

Links about Encoding

These are links which show the specifications for different encoding systems and the languages they are associated with. However, most languages can also be encoded as Unicode (utf-8).

NOTE: "C.P." (Codepage) is the same as "Windows". For instance CP1252 is Windows-1252.

  1. Encodings by Language
  2. All Encoding Charts
  3. ISO-8859 Encoding Charts
  4. Windows Encoding Charts

 

Top of Page | Encoding Tutorial Index

Previous Page  Next Page

Last Modified: Tuesday, 04-Jun-2013 12:41:29 EDT