Teaching and Learning with Technology

Computing With Accents and Foreign Scripts

Skip Menu

Getting Started: Unicode

One of the most persistent problems in multilingual technology has been exchanging documents between applications and operating systems. To make documents more readable across platforms and machines, the Unicode specification was created and has been implemented in many systems.

This page will provide an overview of encoding foreign language text electronically and provide an overview of what kinds of utlities and fonts are needed to support individual languages. It will also point you to places on the site where specific information is outlined.

Contents

  1. What is Unicode Encoding and why is it important?
  2. Which Operating Systems and Software Packages support Unicode?
  3. How do I view Unicode Web Sites and Documents?
  4. Where can I get Unicode Fonts?
  5. How do I create Unicode Documents?
  6. How do I create Unicode Web sites
  7. Test Unicode Sites
  8. Links
  9. Using Mozilla Composer to Create Unicode Sites (New Page)

What is Unicode Encoding?

About Unicode Encoding

Computers store data as numbers, even textual data. An encoding system, such as ASCII, assigns a number to each letter, number or character. Operating systems include programs and fonts which convert these numbers to letters visible on the screen and computer monitor.

Unicode, also known as UTF-8 or the "Universal Alphabet" is a an ordered set of over a million characters covering the majority of writing systems in the world. Unlike older systems, Unicode allows multiple writing systems to co-exist in one data file. Systems which recognize Unicode can consistently read and process data from many languages.

Developers may want to view the Encoding Tutorial to see how encodings are structured and how that affects multilingual computing.

Older Legacy Encodings

Unicode was not developed until the late 1990s. Before that operating systems implemented a series of encodings which supported one or two scripts. In addition, there were vendor-specific differences making data exchange more complicated.

Because Unicode was developed starting in the 1990s, many older Web sites and data files are written in older encoding systems. Therefore, it may be necessary to have fonts and utilities which support both Unicode utilities and older encodings. Each By Language page will specify which older encodings need to be supported for each language.

Top of Page

Which Operating Systems Support Unicode?

The most recent versions of all the major operating systems including Windows, Macintosh and Linux/Unix support Unicode. Generally speaking, the newer the version the better the Unicode support is.

Windows

Unicode support has been present since Windows NT, but Windows XP includes support for more languages and a wider range of foreign language input (typing) options. Many major packages support Unicode including Microsoft Office, Dreamweaver, Adobe, EndNote 8 and others, but some older versions do not.

Macintosh

Unicode was not truly implemented until OS X 10.2, and more recent versions support more languages. System 9 users will experience reduced support for Unicode. Macintosh software has been migrating to Unicode support, so newer versions typicallyhave better Unicode support. Packages with Unicode support include Microsoft Office 2004, Dreamweaver MX 2004, Adobe products, FileMaker 7, TextEdit and others.

Linux/Unix

Unicode support is available, but special fonts and nullitiesmust be installed for each language. See the Unix Unicode Links for more information.

How do I view Unicode Web sites and Documents?

Unicode Fonts

In order to view a Unicode document, a computer must have a font installed which includes characters for the scripts used in the documents. Unicode fonts come in two types.

It is important to install Unicode fonts for each of the scripts you wish to view.

Opening Documents

If you open a Unicode document and cannot read the text, you should:

Configure Browsers

Recommended Browsers

These browsers have the best Unicode support across scripts. However, if you have installed a new font, you may need to adjust the font preferences for the target script.

Internet Explorer for Windows

Recent versions of Internet Explorer include Unicode support, but for optimal viewing of some phonetic characters, you may need to set the Latin font preferences to Arial Unicode.

Internet Explorer for Macintosh

Unicode support is poorly implemented for many scripts. Even is proper fonts are installed, the sites may not be readable. Switching to another browser is strongly recommended.

 

Where can I get the Unicode fonts?

Both Microsoft (Window) and Apple (Macintosh) offer system fonts which support Unicode to some degree. Many organizations also offer freeware fonts for additional characters.

To check which fonts are available on your computer, open up a word processing program such as Microsoft Word or Word Perfect, then check the list of fonts.

Windows

Arial Unicode MS is the most complete Unicode font that is widely available. Lucida Sans Unicode and Tahoma are also available. Additional fonts for each language are also provided by Microsoft and are listed on the By Language pages.

If you need to install additional fonts, see the Unicode font list below then follow the Windows Font installation instructions.

Macintosh

OS X

Lucida Grande (OS X ), Apple Symbols and several East Asian Unicode fonts are included with OS X . The more recent the version, the more numbers of characters are included.

If you need to install additional fonts, see the Unicode font list below then follow the Macintosh Font installation instructions.

System 9

Unicode support varies from script to script, but your computer needs to have installed all Language Kits for scripts you need to read. For System 9, Language Kits are available on the System 9 CD-ROM.

Recommended Freeware Fonts

The following fonts are available to support additional characters not available in the default system fonts. All fonts are free for commercial use and can be installed on both Windows and Mac OS X except where noted. System 9 does not fully support Unicode fonts.

These sites list sources for different fonts by script. A Google search is also recommended for specific scripts.

Top of Page

How do I create Unicode Documents?

1. The first step is to use software which supports Unicode. This includes the following:

2. If you type in a European language such as Spanish, French, German, Italian, Portuguese or Scandinavian languages, then you must use accent codes to input the characters.

3. If you are typing in another script or in a Central European language like Polish, Czech, Hungarian or Slovak, then you must activate a keyboard utility. This will allow you to insert the correct Unicode characters into the document.

Once the document is created, the default is typically Unicode. If in doubt, go to File ยป Save As, then choose the Unicode or UTF-8 encodings.

You can check the details on each By Language page for specific details and codes, but the procedure is much the same.

Top of Page

 

How do I create Unicode Web Pages?

There are several ways you can type or import Unicode text, but each page must include a encoding meta tag specifying the utf-8 Unicode encoding, so that browsers render the text correctly. See the code below:

<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
...
</head>

See the Web Developers section for tips on:

Top of Page

Test Unicode Sites

The following Web sites show Unicode with a number of different scripts.

Results will vary. Some scripts such as Greek and Cyrillic are well supported, others such as Armenian and phonetic symbols have lesser support, and still others such as Runic and Cherokee have little to no support.

Top of Page

Links

Unicode Resources

Unicode Operating Systems

Advanced Unicode

Top of Page

Last Modified: Tuesday, 04-Jun-2013 12:41:35 EDT