Teaching and Learning with Technology

Computing With Accents and Foreign Scripts

Skip Menu

South Asian Scripts

This Page

  1. Languages Covered
  2. Windows vs. Macintosh
  3. Web Development Options
  4. General South Asian Links
  5. Pan South Asian Fonts

Languages/Scripts Covered

Scripts in this region are often called Brahmic scripts because of their origin from the Brahmi (Brāhmī) script. This set of scripts can be further subdivided into Northern and Southern Brahmic scripts.

Southern Brahmic scripts are partly distinguished by the different options for vowel placement and have traditionally not been as well supported technologically as some of the dominant Northern Brahmic scripts.

In addition, some languages are written in right-to-left scripts such as Arabic (particularly in Pakistan) or Thaana (for Divehi).

Classification of South Asian Scripts
Northern Brahmic Southern Brahmic Right-to-left

 

Platform Support

Because of the complex placement of vowel signs for these languages, Unicode fonts are not interchangeable between platforms. OTF fonts work in Windows, but not perfectly in OS X. Apple fonts use ATSUI technology instead. It is better to use South Asian fonts from Microsoft and Apple whenever possible.

South Asian Script by Platform and Versions
Windows Support Macintosh Support Linux/Unix

Windows XP Supports

  • Devanagari
  • Gujarati
  • Gurmukhi (Punjabi)
  • Kannada
  • Tamil
  • Telugu
  • Thaana
  • Urdu/Sindhi

Windows XP Service Pack Two Adds

  • Bengali
  • Malayalam

Windows Vista Adds

Macintosh Supports

  • Devanagari
  • Gujarati
  • Gurmukhi (Punjabi)

System 10.4 (Tiger) Adds

  • Tamil (partial)

System 10.7 (Lion) Adds

  • Kannada
  • Malayalam
  • Oriya
  • Sinhala
  • Tamil (full)
  • Telegu

Freeware Utilites are Available for

X11 Unix Environment

  • Additional Language tools may be available for the Unix X11 environment which comes with Apple.

Top of Page

 

Web Development

South Asian Encoding and Language Tags

Encoding: utf-8 (Unicode) , ISCII (older), ITRANS (older)
Use Unicode to develop new pages.

Inputting and Editing Text in an HTML Editor

One option is to use Dreamweaver, Microsoft Expression or other Web editor and change the keyboard to the correct script. This will allow you to type content in directly with the appropriate script. However, it is important to verify that the correct encoding is specified in the Web page header.

Another option is to compose the basic text in an international or foreign language text editor or word processor and export the content as an HTML or text file with the appropriate encoding. This file could be opened in another HTML editor such as Dreamweaver or Microsoft Expression, and edited for formatting.

Other Web Tools

For Web tools such as Blogs at Penn State, Facebook, Twitter, del.icio.us, Flicker, and others, users can typically change the keyboard and input text. In most cases, this content will be encoded as Unicode.

Unicode Chart with HTML Entity Codes

For short texts, such as the yoga om sign ( = ॐ), it may be desirable to use Unicode Entity codes and enter HTML entity codes.

Available Unicode Charts

ISCII vs. Unicode

Before the development of Unicode encoding, the government of India had developed a standard called ISCII (Indian Script Code for Information Interchange). In this standard similar characters in multiple scripts would be assigned the same character number. For instance Devanagari (ka) and Gujarati (ka) would be assigned the same code point. However, most modern development is in Unicode.

Using Encoding and Language Codes

Computers process text by assuming a certain encoding or a system of matching electronic data with visual text characters. Whenever you develop a Web site you need to make sure the proper encoding is specified in the header tags; otherwise the browser may default to U.S. settings and not display the text properly.

To declare an encoding, insert or inspect the following meta-tag at the top of your HTML file, then replace "???" with one of the encoding codes listed above. If you are not sure, use utf-8 as the encoding.

Generic Encoding Template

<head>
<meta http-equiv="Content-Type" content="text/html; charset=??? ">
...
<head>

Declare Unicode

<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8 ">
...
<head>

XHTML

The final close slash must be included after the final quote mark in the encoding header tag if you are using XHTML

Declare Unicode in XHTML

<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
...
<head>

No Encoding Declared

If no encoding is declared, then the browser uses the default setting, which in the U.S. is typically Latin-1. Some display errors may occur.

Language Tags

Language tags are also suggested so that search engines and screen readers parse the language of a page. These are metadata tags which indicate the language of a page, not devices to trigger translation. Visit the Language Tag page to view information on where to insert it.

PDF and Image Files

In some cases, your best options may be to use PDF files or image files. See the Web Development Tips section for more details.

Top of Page

Links

These pages cover internationalization of South Asian scripts in general.

Computing

Fonts

See also

Last Modified: Tuesday, 04-Jun-2013 12:40:05 EDT