Encoding on the Internet
To increase the number of characters encoded, vendors doubled the range of ASCII to 256 (28) characters. This became known as "8-bit encoding". The usual structure is:
Crucially, each combination of an a letter plus a different accent forms a separate character or code point. For instance, á, â, à, Á, Â, and À are assigned six different numbers in 8-bit encoding.
Unfortunately, not all vendors used the same 8-bit encoding. The biggest difference was that older Windows computers use Windows-1252 while pre-OS X Macintosh uses MacRoman encoding. As a result not all characters are assigned to the same points, plus not all the same characters can be found in both encodings.
For instance, in the chart below character #128 is € (euro) in Windows 1252, but Ä (A-umlaut) in Mac Roman. Similarly the ¥ (yen) character is #165 in Windows-1252, but #180 in MacRoman.
NOTE: Today both Windows and OS X use Unicode, but differences persist due to issues of compatibility with older documents and software. The older the software, the more likely compatibility problems will occur.
Full reference charts are available from the sources below
NOTE: Some charts may list the decimal number (base-10) as well the hexadecimal (base-16) number and octal (base-8) number. In most cases, you would refer to the decimal number.
©Penn State University, 2000-2013.
This Web page maintained by Teaching and Learning with Technology, a unit of Information Technology Services. For questions or comments on this Web page, please contact Elizabeth J. Pyatt (firstname.lastname@example.org).
This site uses Unicode to display non-English characters. This site is best viewed in the most recent versions of your browser.
Unicode character names and hexadecimal entity codes are taken from the public Unicode Character Charts.
This publication is available in alternate media upon request.
Last Modified: Tuesday, 04-Jun-2013 12:41:28 EDT