Teaching and Learning with Technology

Computing With Accents and Foreign Scripts

Skip Menu

Export HTML Text from
an International Word Processor

One way to develop a non-Roman site is to type the text into a foreign language word processor or text editor then export the content as encoded text file or HTML file to which formatting can be added manually or in an HTML editor.

Tools Needed

These text editors allow you to easily type encoded text then export them as properly encoded HTML or text files.

  1. Microsoft Word - The simplest option is to copy and paste the text from Word into another text editor such as Notepad (Windows) or Text Edit (Apple). See details below for saving the files.
    Note: Because of Microsoft HTML formatting issues, export from Microsoft Word is not recommended.
  2. Notepad (free with Windows)
    1. When you save a .txt file, you switch the encoding from ANSI to UTF-8.
    2. You can cut and paste items into Dreamweaver, Web Expression or other HTML editor (HTML file must be set to utf-8 Unicode encoding).
  3. Apple TextEdit (free with OS X)
    1. Set up TextEdit to save as an encoded text (.txt) file by going to Preferences and selecting the Plain Text option then select an appropriate Encoding.
    2. You can cut and paste items into Dreamweaver, Web Expression, or other HTML editor (HTML file must be set to utf-8 Unicode encoding)
  4. UniType GlobalWriter (Windows) - To export an encoded HTML file, go to File then Save As. Select the HTML file type. The next window will ask you to choose an encoding before saving. If in doubt, choose Unicode.
  5. StarOffice (Windows/Linux)
    1. To save StarOffice documents as encoded documents, go to File, then Save As. Select the Text Encoded format. In the next window, select UTF-8 encoding.
    2. You can cut and paste items into Dreamweaver, Web Expression or other HTML editor (Dreamweaver file must be set to utf-8 Unicode encoding)
  6. Other text editors designed for foreign language text editing may be able to export encoded text or HTML files.

Potential Pitfalls

  1. Make sure any exported HTML file declares the encoding within the HTML HEAD tag.The Unicode declaration is given below; see the Declare Encoding page for more examples.

    <head>
    <meta  http-equiv="Content-Type" content="text/html; charset=utf-8">
    </head>

  2. Exported .txt files must be set to the correct encoding; this can usually be done in the general Preferences area or in the Save preferences. See example instructions in the previous section.
  3. Try to include as little formatting as possible; the formatting should come from the HTML editors or be manually inserted.
  4. Avoid specifying specific fonts for a script as some alternative browsers and platforms may not be able to read the page (the encoding should be enough to trigger the font changes). If specific fonts must be specified, then make sure both Window and Macintosh equivalents are specified.
  5. Inspect exported HTML files for vendor-specific codes. Here's an example from Swarthmore of how to tweak exported Chinese text in HTML with Claris Homepage.
  6. Test completed files on multiple browsers and platforms. Files may display correctly in one platform, but not in another.
  7. For U.S. audiences, you may want to test your page on a standard browser to be sure the fonts are in place.
  8. Unfortunately, some scripts may be so undersupported that there may not be a viable encoding system or text editor available. In these cases, other options should be used.

Top of Page

Last Modified: Tuesday, 04-Jun-2013 12:41:32 EDT