Fun with Unicode

Most of us don't give much thought to character encodings. As our Web browsers move effortlessly between Arabic and Cyrillic pages, we may not remember the bad old days, when conflicts among dozens of standards made it very likely that a document or Web page would appear as utter gibberish. Some of those old encodings are still in use, but most of today's browsers, Web sites, and applications comply with the Unicode Standard and its encoding forms (UTF-8 being the most popular).

Unicode_Heiroglyphs.jpg Some Unicode Symbols for Egyptian Hieroglyphs

The name Unicode embodies the three original goals of the standard: universality (encompassing all human languages), uniformity (using fixed-width codes), and the uniqueness of each character representation. In Unicode a unique number, or code point, is assigned to each of thousands of characters in dozens of scripts. Development of the standard began in 1987; today it's maintained and promoted by the Unicode Consortium, a nonprofit whose members include Apple, Google, Microsoft, and every other major tech company (Long Now's PanLex project is an associate member).

You can see all the Unicode characters and symbols by browsing the code charts, or you can use one of several nifty tools developed for exploring Unicode. Here are just a couple:

  • Unicode Utilities. This interface to the Unicode database will tell you more than you ever wanted to know about the character at each code point. Click on "character" to type or paste any character and get a full list of its properties. Click on "confusables" to see which characters can accidentally (or maliciously) be confused with others that look similar.

  • UniView. Developed by Richard Ishida (International­ization Activity Lead at the World Wide Web Consortium), this app allows you to search or browse for any character and discover all its properties. You can search by code point, character, or the name or description of a character. Search for the word "tilde" and you'll get 110 characters; search for "chess" and you'll get symbols for all the chess pieces, in white and black (♔, ♞, etc.). A "lite" version of the app is meant for mobile devices but may also be less intimidating to a new user than the full-featured version.

Next time you translate some text online and then paste it into a document, take a moment to thank the hard-working developers and maintainers of the Unicode Standard. They've spent 25 years thinking about character encodings so that you don't have to.


Recent Comments

Powered by Disqus