on this page: Asian Languages Character sets and encoding Language codes Romanization Web Development general resources
more on this site
[ to directory ]
Double-click on any word to get a definition!
- General for all
- i have a copy of it:
- i use it in teaching:
links - i18n
The techniques to use and display different
languages and scripts in computer formats.
The abbreviation for "internationalisation" is
"i18n". Why? 18 letters between the "i" and the "n".
(The American spelling is "internationalization".)
The actual translation into various languages.
is the process by which all of the labels and messages, used by a piece of
software, are translated into various different languages.
Then that software is available to more people, in their native language.
The abbreviation for "localisation" is
"l10n". Why? 18 letters between the "i" and the "n".
(The American spelling is "localization".)
When making a web page in a (non-roman) script,
you need to think about:
character set (Unicode or latin or other)
(Microsoft and other OS's have keyboards built in
for many scripts — like russian and chinese.
Unfortunately there is not yet a built-in keyboard
for Tibetan and Mongolian, so we have to get it
Do you have more to add? Please let us know at the contact form
CJKV Information Processing
A book by Ken Lunde, (OReilly)
The definitive guide for tackling the difficult issues faced when dealing with complex Asian languages - Chinese, Japanese, Korean, and Vietnamese - in the context of computing or internet services.
- Topics covered include:
- Writing systems, Character set standards, Encoding methods,
- Input methods, Output methods, Font formats, Typography,
- Programming and code conversion techniques, Internet and Web implications.
- Dictionaries and dictionary software
- Also includes valuable appendices:
- code conversion tables, character set tables, character set indexes, mapping tables, Perl code examples, glossary, detailed bibliography.
Character sets and encoding
Character Code Issues - Tutorial
Jukka Korpela has assembled a huge amount of information related to character sets. The tutorial is well-written and very interesting reading.
Do you know your character encodings?
Good (relatively) simple introduction. "Have you ever noticed certain characters on your site not displaying the way they should? Perhaps the curly quotation marks look like little boxes, or the long dashes have been replaced with question marks. Problems like these usually arise from an incomplete understanding of character encodings ... "
: 1 March 2006
ISO 639 is the set of international standards that lists short codes for language names. "The language codes defined in the several sections of ISO 639 are used for bibliographic purposes and, in computing and internet environments, as a key element of locale data. The codes also find use in various applications, for example, in Wikipedia URLs for its different language editions."
ISO 639-1 codes - List
The standard codes for languages. There are other codes listed, but 639-1 is the one we use.
Interesting thoughts, serious and not-so.
A guide to help application developers and others understand what it takes to globalize an application. Each article in this guide has two main sections: Overview, and Code
OASIS - Organization for the Advancement of Structured Information Standards
An international consortium that creates interoperable industry specifications based on public standards.
By facilitating language-based development, SIL International serves the peoples of the world through research, translation, and literacy
More links.php info on this site