One of the most persistent problems in multilingual technology has been exchanging documents between applications and operating systems. To make documents more readable across platforms and machines, the Unicode specification was created and has been implemented in many systems.
This page will provide an overview of encoding foreign language text electronically and provide an overview of what kinds of utlities and fonts are needed to support individual languages. It will also point you to places on the site where specific information is outlined.
Computers store data as numbers, even textual data. An encoding system, such as ASCII, assigns a number to each letter, number or character. Operating systems include programs and fonts which convert these numbers to letters visible on the screen and computer monitor.
Unicode, also known as UTF-8 or the "Universal Alphabet" is a an ordered set of over a million characters covering the majority of writing systems in the world. Unlike older systems, Unicode allows multiple writing systems to co-exist in one data file. Systems which recognize Unicode can consistently read and process data from many languages.
Developers may want to view the Encoding Tutorial to see how encodings are structured and how that affects multilingual computing.
Unicode was not developed until the late 1990s. Before that operating systems implemented a series of encodings which supported one or two scripts. In addition, there were vendor-specific differences making data exchange more complicated.
Because Unicode was developed starting in the 1990s, many older Web sites and data files are written in older encoding systems. Therefore, it may be necessary to have fonts and utilities which support both Unicode utilities and older encodings. Each By Language page will specify which older encodings need to be supported for each language.
The most recent versions of all the major operating systems including Windows, Macintosh and Linux/Unix support Unicode. Generally speaking, the newer the version the better the Unicode support is.
Unicode support has been present since Windows NT, but Windows XP includes support for more languages and a wider range of foreign language input (typing) options. Many major packages support Unicode including Microsoft Office, Dreamweaver, Adobe, EndNote 8 and others, but some older versions do not.
Unicode was not truly implemented until OS X 10.2, and more recent versions support more languages. System 9 users will experience reduced support for Unicode. Macintosh software has been migrating to Unicode support, so newer versions typicallyhave better Unicode support. Packages with Unicode support include Microsoft Office 2004, Dreamweaver MX 2004, Adobe products, FileMaker 7, TextEdit and others.
Unicode support is available, but special fonts and nullitiesmust be installed for each language. See the Unix Unicode Links for more information.
In order to view a Unicode document, a computer must have a font installed which includes characters for the scripts used in the documents. Unicode fonts come in two types.
It is important to install Unicode fonts for each of the scripts you wish to view.
If you open a Unicode document and cannot read the text, you should:
These browsers have the best Unicode support across scripts. However, if you have installed a new font, you may need to adjust the font preferences for the target script.
Recent versions of Internet Explorer include Unicode support, but for optimal viewing of some phonetic characters, you may need to set the Latin font preferences to Arial Unicode.
Unicode support is poorly implemented for many scripts. Even is proper fonts are installed, the sites may not be readable. Switching to another browser is strongly recommended.
Both Microsoft (Window) and Apple (Macintosh) offer system fonts which support Unicode to some degree. Many organizations also offer freeware fonts for additional characters.
To check which fonts are available on your computer, open up a word processing program such as Microsoft Word or Word Perfect, then check the list of fonts.
Arial Unicode MS is the most complete Unicode font that is widely available. Lucida Sans Unicode and Tahoma are also available. Additional fonts for each language are also provided by Microsoft and are listed on the By Language pages.
Lucida Grande (OS X ), Apple Symbols and several East Asian Unicode fonts are included with OS X . The more recent the version, the more numbers of characters are included.
Unicode support varies from script to script, but your computer needs to have installed all Language Kits for scripts you need to read. For System 9, Language Kits are available on the System 9 CD-ROM.
The following fonts are available to support additional characters not available in the default system fonts. All fonts are free for commercial use and can be installed on both Windows and Mac OS X except where noted. System 9 does not fully support Unicode fonts.
These sites list sources for different fonts by script. A Google search is also recommended for specific scripts.
1. The first step is to use software which supports Unicode. This includes the following:
2. If you type in a European language such as Spanish, French, German, Italian, Portuguese or Scandinavian languages, then you must use accent codes to input the characters.
3. If you are typing in another script or in a Central European language like Polish, Czech, Hungarian or Slovak, then you must activate a keyboard utility. This will allow you to insert the correct Unicode characters into the document.
Once the document is created, the default is typically Unicode. If in doubt, go to File » Save As, then choose the Unicode or UTF-8 encodings.
You can check the details on each By Language page for specific details and codes, but the procedure is much the same.
There are several ways you can type or import Unicode text, but each page must include a encoding meta tag specifying the utf-8 Unicode encoding, so that browsers render the text correctly. See the code below:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
See the Web Developers section for tips on:
The following Web sites show Unicode with a number of different scripts.
Results will vary. Some scripts such as Greek and Cyrillic are well supported, others such as Armenian and phonetic symbols have lesser support, and still others such as Runic and Cherokee have little to no support.
©Penn State University, 2000-2013.
This Web page maintained by Teaching and Learning with Technology, a unit of Information Technology Services. For questions or comments on this Web page, please contact Elizabeth J. Pyatt (email@example.com).
This site uses Unicode to display non-English characters. This site is best viewed in the most recent versions of your browser.
Unicode character names and hexadecimal entity codes are taken from the public Unicode Character Charts.
This publication is available in alternate media upon request.
Last Modified: Tuesday, 04-Jun-2013 12:41:35 EDT