Teaching and Learning with Technology

Computing With Accents and Foreign Scripts

Skip Menu

Turkic and Central Asian Languages

This page discusses utilities needed to support various languages of Central Asia. See the Turkish page for specific information about that language.

Thanks to Gernot Katzer and Ozgur Sahoglu for their technical assistance.

This Page

  1. What are the Central Asian Languages
  2. Browser Setup
  3. Set up Roman/Latin Support for Turkic
  4. Turkic Accent Codes
  5. Set up Cyrillic Script Support for Turkic
  6. Set up Arabic Support for Turkic
  7. Web Development
  8. Unicode Entity Codes (Latin Script Only)
  9. Links

What are the Central Asian Languages

Definition

Central Asian languages are spoken in the region approximately between Russia and Western China, to the north of India and Iran and south of Siberia. Most of these are Turkic languages, a set of closely related languages of the Asian steppes. The most prominent language is probably Turkish, but the family includes many languages from the former Soviet Union and Afghanistan including Azerbaijani, Tatar, Turkmen, Uzbek and others.

Note: Tajik is actually related to Persian, but because of its proximity to other other languages, it has also faced the same political and script issues.

Three Alternate Scripts

Because of political changes in the region, the Central Asian languages can be written in the Roman alphabet, the Cyrillic alphabet or the Arabic alphabet depending on the region or era. For instance, Old Ottoman Turkish was written in the Arabic alphabet, but modern Turkish is written in the Roman alphabet. For all scripts, the Turkic sound system required the addition of special characters, therefore requiring special technical support regardless of the script used.

In addition, many governments from the former Soviet Union are planning or have implemented a switch from Cyrillic to the Roman alphabet. Thus, Uzbek can be written in all three scripts depending on whether one refers to Soviet Uzbek, current Uzbek or the Uzbek of Afghanistan.

Links on Turkic Languages

Top of Page

Browser Setup

Test Pages

Fonts

If one these fonts are installed, you should be able to read any of the Turkic language page written in any of the scripts. Fonts with links go to freeware font downloads.

Recommended Browsers

Browsers which fully support Unicode are strongly recommended. Click link in list to view configuration instructions. You will be asked to match a script with a font.

Top of Page

Set Up Roman/Latin Support for Turkic

Both Microsoft and Apple include utilities which allow users switch to keyboards with extra Turkic letters or Cyrillic letters with Turkic extensions.

Windows

Windows XP includes keyboards for Turkish and many of the other Turkic languages. Follow the instructions for Activating Keyboard Locales to activate and switch Microsoft keyboards to the appropriate layout.

Macintosh OS X

The OS X operating system includes support for Turkish and Northern Sami (10.3, this shares letters with Turkic).As of 10.5 (Leopard), support for Kazakh and Uighur were said to be added (it's not clear which script has been implemented).

Follow the instructions for Activating Macintosh Keyboards to activate and switch Macintosh keyboards. For characters missing from the Turkish keyboards, you can then use the Unicode Hex Input keyboard to input special Central Asian characters. See the Turkic Accent Codes below for details.

There is a freeware Azeri PC keyboard utility available, but it only works in Unicode aware applications such as TextEdit. See documentation with keyboard for instructions.

Note: All keyboards only work in in Unicode aware editors such as TextEdit, Microsoft Office 2004, Dreamweaver MX 2004, Netscape Composer and others.

Top of Page

Central Asian Accent Codes

This table combines the Word 2003/2007 ALT codes plus the Macintosh accent codes for special characters used in many Turkic languages.

Note on Windows Codes: Codes with numbers over 255 are only available for Windows XP . Users with older versions of Windows may need to use the Character Map utility. More detailed instructions about typing accents with ALT keys are available. 

Note on Numeric Macintosh Codes: To use the numeric codes (e.g. Option+018F), activate the Unicode Hex Input keyboard and use the numeric option codes. All other codes (e.g. Option+u,u) can be used in the U.S. keyboard.

Accent Codes for Turkic Vowels
Character Description Windows Alt Code Macintosh OS X Codes
Ə Capital schwa ALT+0399 Option+018F
Ǝ Capital upside down E ALT+0398 Option+018E
ə Lower schwa ALT+0601 Option+0259
İ Capital dotted  I ALT+0304 Option+0130
ı Lower dotless I ALT+0305 Option+0131
Ì Capital I grave ALT+0204 Option+`,I
Ī Capital I macron ALT+0298 Option+012A
ī Lower I macron ALT+0299 Option+012B
Ö Capital O umlaut ALT+0214 Option+u,O
ö Lower O umlaut ALT+0246 Option+u,o
Ü Capital U umlaut ALT+0220 Option+u,U
ü Lower U umlaut ALT+0252 Option+u,u
Ū Capital U macron ALT+0362 Option+016A
ū Lower u macron ALT+0363 Option+016B

 

Accent Codes for Central European Consonants
Character Description Windows Alt Code Macintosh OS X Codes
Ç Capital C cedille ALT+0199 Shift+Option+C
ç Lower C cedille ALT+0231 Option+C
Đ Capital D stroke ALT+0272 Option+0110
đ Lower D stroke ALT+0273 Option+0111
Ğ Capital G breve ALT+0286 Option+011D
ğ Lower G breve ALT+0287 Option+011E
Ŋ Capital engma (N hook) ALT+0330 Option+014A
ŋ Lower engma (N hook) ALT+0331 Option+014B
Ñ Capital N tilde ALT+0209 Shift+Option+N
ñLower N tilde ALT+0241 Option+N,n
Ş Capital S cedille ALT+0350 Option+015E
ş Lower S cedille ALT+0351 Option+015F
Ƶ Capital Z bar ALT+0437 Option+01B5
ƶ Lower S bar ALT+0438 Option+01B6

Fonts with all Latin Characters

These fonts include a large set of "exotic" Latin alphabet characters.

Top of Page

Setup Cyrillic Support

Fonts

If one these fonts are installed, you should be able to read any of the Turkic language page written in the Roman alphabet. Fonts with links go to freeware font downloads.

Keyboard Support

Windows

Windows XP includes keyboards for Turkish and many of the other Turkic languages. Follow the instructions for Activating Keyboard Locales to activate and switch Microsoft keyboards to the appropriate layout.

Macintosh OS X

The correct fonts are installed for Mac OS X , but no keyboards are available from Apple at this time. One suggestion is to type most content with a Russian keyboard, then use the Unicode Hex Input keyboard to input special Central Asian characters. See the Cyrillic Unicode chart for a list of hexadecimal codes which can be inputted into OSX.
Note: This works only for Unicode aware editors such as TextEdit, Dreamweaver M X Netscape Composer 7, Microsoft Office 2004 and Nisus Writer Express.

As of 10.5 (Leopard), support for Kazakh and Uighur were said to be added (it's not clear which script has been implemented).

Top of Page

Setup Arabic Support

Fonts

If one these fonts are installed, you should be able to read any of the Turkic language page written in the Roman alphabet. Fonts with links go to freeware font downloads.

Keyboard Support

Windows

Microsoft supplies keyboard utilities for Farsi, Arabic and Urdu in Windows XP . See the Windows Keyboard instructions for installing the keyboards from the Windows System CD-ROM. Extra letters can be inserted via the Character Map.

Right to Left Typing in Word for Windows

See instructions for configuring right to left typing in Word for Windows for tips on how to type RTL languages.

Macintosh OS X

The OS X operating system includes support for Afgan Uzbek, Afgan Dari, Pashto and Persian. Follow the instructions for Activating Macintosh Keyboards to activate and switch Macintosh keyboards. In OS X , extra letters can be inserted from the Unicode Character Pallette.
Note: These work only in Unicode aware editors such as TextEdit, Microsoft Office 2004 and Nisus Writer Express.

As of 10.5 (Leopard), support for Kazakh and Uighur were said to be added (it's not clear which script has been implemented).

Additional Macintosh RTL Tips

See tips for creating Mac Right-to-Left documents (including alternatives to Microsoft Office) for more information

Macintosh System 9

The Persian Language Kit is available for earlier Mac systems.

Top of Page

Web Development

Encoding and Language Tags

These are the codes which allow browsers and screen readers to process data as the appropriate language. All letters in codes are lower case. Unicode is the only encoding which would allow you to post in multiple scripts.

Encoding: utf-8 (Unicode)

Selected Lang Codes

RFC 3066 Script Variants

The following script variants have been registered as RFC 3066 language code variants. Although they may not be widely supported at this time, they are more descriptive

Using Encoding and Language Codes

Computers process text by assuming a certain encoding or a system of matching electronic data with visual text characters. Whenever you develop a Web site you need to make sure the proper encoding is specified in the header tags; otherwise the browser may default to U.S. settings and not display the text properly.

To declare an encoding, insert or inspect the following meta-tag at the top of your HTML file, then replace "???" with one of the encoding codes listed above. If you are not sure, use utf-8 as the encoding.

Generic Encoding Template

<head>
<meta http-equiv="Content-Type" content="text/html; charset=??? ">
...
<head>

Declare Unicode

<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8 ">
...
<head>

XHTML

The final close slash must be included after the final quote mark in the encoding header tag if you are using XHTML

Declare Unicode in XHTML

<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
...
<head>

No Encoding Declared

If no encoding is declared, then the browser uses the default setting, which in the U.S. is typically Latin-1. Some display errors may occur.

Language Tags

Language tags are also suggested so that search engines and screen readers parse the language of a page. These are metadata tags which indicate the language of a page, not devices to trigger translation. Visit the Language Tag page to view information on where to insert it.

 

Inputting and Editing Text in an HTML Editor

One option is to use Dreamweaver, Microsoft Expression or other Web editor and change the keyboard to the correct script. This will allow you to type content in directly with the appropriate script. However, it is important to verify that the correct encoding is specified in the Web page header.

Another option is to compose the basic text in an international or foreign language text editor or word processor and export the content as an HTML or text file with the appropriate encoding. This file could be opened in another HTML editor such as Dreamweaver or Microsoft Expression, and edited for formatting.

Other Web Tools

For Web tools such as Blogs at Penn State, Facebook, Twitter, del.icio.us, Flicker, and others, users can typically change the keyboard and input text. In most cases, this content will be encoded as Unicode.

Specifying Text Direction

Some HTML editors set the direction of the text automatically. but it can also be set manually by using the newer <dir> and <bdo> attributes. See the Right-to-Left Alignment Tips page for more details.

Top of Page

HTML Unicode Entity Codes

Declare Unicode Encoding

The codes listed below are valid for Unicode HTML pages only, and may not work on very old browsers. To make your page a Unicode page, add the following meta tag to the <head> portion of your document. Use these codes if you need to insert a Turkish word or short phrase within a multilingual text.

<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8 ">
...
<head>

If you forget to include this tag, then some browsers, such as Netscape 4.7, may not display the characters properly.

The Latin Alphabet Codes

Use these codes to input accented letters in HTML. For instance, if you want to type çarşi you would type &cedil;ar&#351;i. These numbers are also used with the Windows Alt codes listed above.

Turkic Unicode HTML codes

Turkic Vowels
Vwl Entity Codes
Ə &#399;
Capital Schwa
Ǝ &#398;
Capital upside-down E
ə &#601;
Lower schwa
İ &#304;
Capital dotted I
ı &#305;
Lower dotless I
Ì &Igrave; (204)
Capital I grave
Ö &Ouml; (214)
ö &ouml; (246)
Ü &Uuml; (220)
ü &uuml; (252)
Turkic Consonants
Cns Entity Codes
Ç &Ccedil; (199)
Capital C cedille
ç &ccedil; (231)
Lower C cedille
Đ &#272;
Capital D stroke
đ &#273;
Lower D stroke
Ğ &#286;
Capital G breve
ğ &#287;
Lower G breve
Ŋ &#330;
Capital N hook
ŋ &#331;
Lower N Hook
Ñ

&Ntilde; (209)
Capital N tilde

ñ &ntilde; (241)
Lowe N tilde
Ş &#350;
Capital S cedille
ş &#351;
Lower S cedille
Ƶ &#437;
Capital Z bar
ƶ &#438;
Lower Z bar
 

Cyrillic Entity Codes

For some texts, it may be necessary to use Unicode entity codes for Cyrillic to enter letters which may not be available on any keyboards.

PDF and Image Files

In some cases, your best options may be to use PDF files or image files. See the Web Development Tips section for more details.

Top of Page

Links

Turkic Languages and Internationalization

Links of Specific Turkic Languages

Tajik

Top of Page

Last Modified: Tuesday, 04-Jun-2013 12:40:08 EDT