Teaching and Learning with Technology

Computing With Accents and Foreign Scripts

Skip Menu

Declare the Encoding

If you create a Web site, it is good practice to declare the encoding. Properly encoded Web pages declare the encoding to a browser through a meta tag in the header. Without this tag, a browser may not know to switch to the proper encoding and characters may be displayed as gibberish.

Some example declarations for common encodings are given below. If you are not sure which encoding system to declare, you may want to refer to the individual By Language Page or look at which system is declared in other Web sites written in the language.

Top of Page

Sample Encoding Declarations

Unicode | Latin 1 | Other

Unicode (Any Language)

The encoding meta tag is placed in the header. The encoding tage (e.g. utf-8 for Unicode) is declared after charset= specification at the end of the tag.

<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
...
</head>

XHTML

There are two tags - the encoding attribute in the initial XML tag and the charset meta tag (with a final slash). Both tags should be included for cross-browser compatability.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta http-equiv="content-type" content="application/xhtml+xml; charset=UTF-8" />
...
</head>

Note: These tags should be included even though XML is theoretically Unicode by default. Not all browsers will parse a page as Unicode unless the meta tag is present.

Latin 1 (English, Spanish, French, German, etc.)

<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
...
<head>

NOTE: It IS good practice to declare the encoding even for an English Web site. One function of this is to tag is to "reset" the user's browser back to Latin-1 and ensure proper font settings. The Unicode "utf-8" encoding also ensures that any special characters inserted such as "Smart quotes", currency symbols, em-dashes and so forth will be properly displayed in most browsers.

Other Scripts (e.g. Windows-1251 for Cyrillic)

<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1251">
...
<head>

See the individual By Language Page for other encodings or go to pages in your script and go to the View Source window to see which encodings are generally used.

If no encoding is declared, then the browser uses the default setting, which in the U.S. is typically Latin-1. If the page is actually in some other script, but no encoding specified, the browser will use a Roman alphabet font and display gibberish.

Top of Page

 

What if a UTF-8 Web page does not display properly?

If you upload or post a Web page with the correct encoding meta tag, but the characters do not display correctly in all browsers, then you may need to work with your Web server admin to configure the Web server.

Suggestions are posted at

Top of Page

Last Modified: Tuesday, 04-Jun-2013 12:41:32 EDT