The Rosetta Blog > Most Recent Posts

Loading Comment Data...

Posted 2 years, 2 months ago by Karin Wiecha

Android App for Language Documentation

Steven Bird, Associate Professor in Computer Science at the University of Melbourne, and his team have developed an Android application for the documentation of language. The easy-to-use app was first tested in the field last year in Papua New Guinea, where Dr. Bird and his colleagues provided the Usarufa people with Android cell phones, equipped with the app, to record themselves speaking their language.

The Usarufa language is still spoken by only about a thousand people. The language was the first to be recorded with the new technology in the pilot project and also the source of the app’s name. The developers named their application Aikuma after the Usarufa word for “meeting”.

Speaker using the Aikuma app

The pilot project turned out to be very successful. The Usarufa speakers had no difficulties using the app after brief instruction and enjoyed recording their stories, personal narratives, songs and dialogues.

A more recent field trip led Dr. Bird and his colleagues to the Tembé people, who live in an Amazonian reservation in Brazil. The people are aware of the endangered status of their language, which has only about 150 remaining fluent speakers, so they invited the researchers into their remote village to help them preserve their linguistic heritage.

75 miles away from the nearest town, the villagers do not have internet access and are not familiar with the latest technological devices. The minimal design of the Aikuma app and the use of touch-screen phones allow for an intuitive method of recording. The speakers record themselves by just pushing the record button, holding the phone to their ear and talking as if they were making a phone call.

A particularly useful feature of the app is the ability to add a time-aligned audio translation. Most of the minority language speakers in Brazil speak Portuguese as well. A translation of the recordings in a language that is more widely spoken can help ensure that the content of the recordings will be understood even if the language loses its last speakers in the future.

Steven Bird talking about his current field work in Amazonia

In the past field linguists tended to focus on producing written documentation, but the transcription of speech with the International Phonetic Alphabet is a very time-consuming task that can take up to an hour for each minute of spoken language. The documentation of severly endangered languages is a race against time and recordings of the actual language in use can only be made while there are speakers left. Dr. Bird points out the importance of recording endangered language speakers:

"We collect and archive language recordings now while the speakers are still alive. That’s all. We have the whole of the future to transcribe and process the recordings...The living speakers of today’s disappearing languages are equipped to preserve their voices, their unique perspective on the world, and how they have managed to live sustainably in their homeland for centuries."

The research team found only six fluent speakers in the village, but they were all keen on recording their Tembé stories and legends and translated them into Portuguese as shown in the video below.

The most recent field trip lead the researchers to another Amazonian tribe, the Baré. However they couldn't find any fluent speakers of this endangered language, and found that everyone had shifted to the more widely used Portuguese or Nhengatu - a language that is undergoing shift as well, but is still spoken by about 20,000 people.

The Nhengatu speakers recorded some of their stories using the Aikuma app and it struck Dr. Bird how one of the speakers gestured with one hand while speaking. From an anthropological as well as linguistic point of view gestures are a rich source of information. The speakers also enjoyed making video recordings of each other, and this was particularly helpful when an elderly speaker wasn't able to manipulate the touch-screen in order to make a recording. A solution to this and a further development of the Aikuma app prototype could be to change the format from audio to video in order to capture this additional information and let the speakers create video recordings of each other.

The pilot projects turned out very successful and insightful in both preserving some of the stories and languages of the peoples involved as well as providing the developers with ideas on how to further improve the Aikuma app so it can be used successfully for the documentation of endangered languages in even the most remote places.

Read more about the Aikuma pilot projects in Papua New Guinea and Amazonia.

You can also follow Steven Bird’s ongoing fieldwork via Twitter.

Loading Comment Data...

Posted 2 years, 2 months ago by Karin Wiecha

Review of "Bringing Our Languages Home"

Throughout the world indigenous and minority languages are losing ground and are being displaced by more dominant languages of wider communication. This trend has often resulted from historical injustices commited against indigenous peoples. But the recent times have seen the formation of a growing number of indigenous communities reclaiming their endangered or even extinct heritage languages.

The Master-Apprentice Language Learning Program is one such initiative, pairing a proficient speaker of a given endangered language, the master, with an adult non-speaker eager to learn their heritage language, the apprentice. Developed by Leanne Hinton, professor emerita at the University of California, Berkeley, and the Advocates for Indigenous California Language Survival, this successful program has spread throughout the United States and other countries like Canada or Australia.

In her new book, Bringing Our Languages Home: Language Revitalization for Families, Hinton takes it a step further and brings language revitalization to the place where languages are really learned: the family.

The 13 case studies at the heart of the book are representative for the growing number of parents who want to enable their children to grow up with their heritage language even though they themselves might not have had this opportunity. Hinton turns the floor over to these 13 families and minority language advocates to tell their stories in their own words. The result is a book that is engaging and useful for linguists as well as anyone with an interest in the preservation and revitalization of endangered languages.

The backgrounds of the families portrayed cover a wide range - geographically as well as in the languages that are being revitalized and the context this occurs in. For some of the languages there’s still a sufficient number of native speakers to learn from. Some are being taught in immersion school programs. Other languages, as in the case of the native American Myaamia and Wampanoag, were not spoken in generations. Nevertheless, Daryl Baldwin and jessie little doe baird, who tell their stories in the first two chapters, did not shy away from this challenge, invested years of their lives to get degrees in linguistics and became experts in their respective languages. They are reaping the rewards of their efforts when they hear their children speak the language of their ancestors that had been silent for so long.

But despite the difference in initial linguistic circumstances, there’s one striking parallel in all the narratives: the constant resistance against the dominance of the ubiquitous English language. Of course, English is just a placeholder for any dominant majority language and just happens to be the national language in all the narratives (the families living in the USA, New Zealand, Ireland and Scotland respectively). But it is symbolic of the influential status of majority languages endangered languages all over the globe have to be able to compete with if they are to survive and thrive.

Bringing Our Languages Home is not just a collection of case studies but also a practical guide for families who want to venture the reclamation of their languages. The variety of starting conditions and external factors as well as the assets but also obstacles portrayed in the personal narratives give interested families a realistic impression of the revitalization of a language in the family setting. Moreover, in the final chapter Hinton sums up what we can learn from these pioneering families and complements this with hands-on tips and approaches to language revitalization for nearly any linguistic point of departure.

Leanne Hinton’s Bringing Our Languages Home: Language Revitalitation for Families is a ray of hope for even the most endangered language communities without glossing over the challenges that inevitably go with the revitalization of fragile languages in the omnipresence of established languages like English.

Loading Comment Data...

Posted 2 years, 3 months ago by Karin Wiecha

New Estimates on the Rate of Global Language Loss

The Endangered Languages Catalogue (ELCat) is a project by the University of Hawai’i at Manoa and Eastern Michigan University, supported by a National Science Foundation grant. The project aims to compile a comprehensive up-to-date catalogue on all languages considered to be in danger, providing information on:

  • the number of speakers, age of the youngest speakers and location of each language
  • the genetic affiliation to a linguistic family for every language and
  • an account of the documentation and data that already exist on any given language of the database.

The three-year project was initialized in 02011 and is planned in two phases. In Phase I data crucial in determining whether a given language is in danger was gathered by linguistic research teams at both universities. This phase has just been completed and the findings are available on the website of the Endangered Languages Project, the public portal of the ELCat helping raise awareness of and gathering data on endangered languages.

Endangered languages in the USA (click on the image to browse this interactive world map)

The Endangered Languages Project (ELP) is an initiative of the newly formed Alliance for Linguistic Diversity, a coalition of international linguistic and cultural organizations, and Google. The Rosetta Project and PanLex Project at The Long Now Foundation are also members of the Alliance. ELP is different from similar projects in that it is a community-driven resource. Anyone involved with endangered languages is invited to contribute to the database. This way endangered language communities as well as researchers working with them can upload, update and correct the available information and help expand the database in a collaborative effort.

The first results of this collaboration have been presented by Lyle Campbell, ELCat Project Director and linguistics professor at the University of Hawai'i at Manoa, during the 3rd International Conference on Language Documentation & Conservation (ICLDC 3). The updated and newly compiled data allowed the researchers to determine which of the world’s living languages are at risk of dying out and to what extent each individual language is endangered. In order to determine whether a language is at risk, ELCat has developed the Language Endangerment Scale. The Ethnologue, a well-established comprehensive language catalogue for basic information of all living - not only endangered - languages, presented their own newly-developed scale for language endangerment, called EGIDS, at the same conference. ELCat's scale is different in that it has a smaller set of criteria, focusing exclusively on endangered languages, which serves the purpose of the Endangered Languages Catalogue. Still, there are some parallels to EGIDS. On the basis of four criteria, ELCAT's Language Endangerment Scale assigns six different levels of endangerment to each language, ranging from 0 - Safe to 5 - Critically Endangered. The criteria are:

  • Intergenerational Transmission (How old are the youngest speakers and is the language passed on to younger generations?)
  • Absolute number of speakers
  • Speaker number trends (Is the number of speakers declining, stable or increasing?)
  • Domains of use of the language (Is the language only used in certain (e.g. informal) contexts or for every domain in life from home to media, education and government?)

The findings yielded by this scaling and the updated database provide us with new knowledge on language loss. Earlier estimates lead to the prospects of the death of 50-90% of the world’s languages by the end of the century. Another claim that has been made very frequently when talking about language endangerment is that one language goes extinct every two weeks. Both estimates are, however, not in accordance with ELCat’s new data as presented at the ICLDC 3 earlier this month.

The source of the prediction of the death of up to 90% of all languages by the end of the 21st century is a 01992 paper titled The World's Languages in Crisis [1] by Michael Krauss, professor emeritus of the University of Alaska Fairbanks and expert on the indigenous Alaskan language Eyak, whose last native speaker passed away in 02008. Krauss arrived at this estimate based on the best available sources at that time. This paper and the presentation Krauss gave on that topic at the Linguistic Society of America's annual meeting in 01991 can be seen as a pivotal moment for the awareness of language loss.

Over two decades later, on the basis of ELCat’s much more comprehensive database (and recent results of the Ethnologue support this), we know that Krauss’ estimates were too high.* The application of the Language Endangerment Scale to all known languages has revealed that a total of 3,176 can be considered to be endangered. This is about 46% of all living languages, far from Krauss' 90% worst case scenario. Nontheless, Krauss’ lower threshold of 50% might after all become sad truth if endangered languages keep losing ground.

Another number that had to be corrected is the estimated extinction of one language every 2 weeks. This figure has been repeated so often in the discourse on language death that it is hard to trace back where it originated from. Even though Krauss did not make this claim, it seems most likely that it was calculated based on the estimates presented in his paper, as for instance linguist David Crystal did in his 02000 book Language Death (p. 19). [2]

ELCat's new findings, however, suggest that language death progresses at the rate of about one language in three months rather than two weeks. [3] This estimate is based on the number of languages that we know have become extinct in the recent past rather than estimates of how many languages might go extinct in the future.

Though it is good news that language loss is not proceeding quite as quickly as we previously thought, this does not mean that linguistic diversity is on the safe side. The looming loss of almost half of the world’s languages is sufficient proof for the “ongoing crisis of language loss," as Campbell phrased it. The new findings also show that the rate at which languages die out has highly accelerated in the last half century. Campbell concluded:

"These losses are still horrendous…There is no need to repeat the inaccurate claim [that one language goes extinct each two weeks]...What we see is shocking enough."

Today 457 or 9.2% of the living languages have fewer than 10 speakers and are very likely to die out soon, if no revitalization efforts are made. 639 of the languages known to have existed are already extinct – 10% of all languages.

Moreover, we now know that since 1960 we have lost as many as 28 entire language families. This is even more devastating from the viewpoint of linguistic diversity. A language family is a group of languages that have emerged from a common proto-language. Linguists can reconstruct such relations if a set of languages share certain grammatical and phonetic features. The number of languages in a language family can vary from over a thousand (as in the Niger-Congo and Austronesian language families) to just a few. Languages that cannot be related to any other language are called isolates. The language family with the most speakers is Indo-European, encompassing languages like English, Spanish, Russian or Hindi - just to name a few of the over 200 languages belonging to this family. But a language family does not have to have 3 billion speakers, as in the case of Indo-European, for its extinction to have a considerable impact on linguistic diversity.

ELCat uses the metaphor of biodiversity to illustrate the gravity of the loss of an entire language family: If we compare the extinction of a language to the extinction of an animal species, the death of a language family would equal the loss of a whole branch of the animal kingdom, for example all felines.[4] We know of a hundred language families that have gone extinct over the course of history - 24% of the world's linguistic diversity. But the fact that 28 of them have gone extinct over the relatively short time span of the last 50 years is symptomatic of the accelerated rate of language loss we are experiencing in recent times.

Now that all available information on the entirety of endangered languages has been gathered and updated, the next step in the ELCat project is to fill the gaps, expand the available data and introduce a measure of how much documentation exists for each of the 3,176 endangered languages. The ELP website already provides some bibliographical references on existing documentation for a number of languages, alongside all sorts of texts, video and audio material uploaded by researchers or native speakers. The aim for Phase II of the ELCat project is to complete this information, especially for languages where there has been very little information to date.

The purpose of the information provided in the database is manifold. It allows researchers to work collaboratively on the expansion of the information, it aims to point to and interest linguists and future researchers in the least documented languages, it invites endangered language speech communities to contribute information on their language and provides material for preservation and revitalization programs. ELCat and the Endangered Languages Project hope that this way their community-driven database helps raising public awareness of language endangerment and can contribute to stopping or reversing the language loss.

Listen to Lyle Campbell's talk at the 3rd International Conference on Language Documentation & Conservation.

[1] Krauss, Michael E. 1992. The World's Languages in Crisis. Language 68(1): 4-10.

[2] Crystal, David. 2000. Language Death. Cambridge: Cambridge University Press.

[3] Campbell, Lyle; Lee, Nala Huiying; Okura, Eve; Simpson, Sean; Ueki, Kaori. 2013. New Knowledge: Findings from the Catalogue of Endangered Languages (“ELCat”). 3rd International Conference on Language Documentation & Conservation.

[4] Aristrar, Anthony et al. About the Catalogue of Endangered Languages. University of Hawai’i at Manoa.

*New findings of the Ethnologue suggest that the state of languages in Australia, New Zealand and Northern America is very close to this estimate with only 9% of the languages of Australia and New Zealand and 7% of the languages of the USA and Canada still being vital, the rest being in danger (or extinct). On a global scale, however, considering e.g. the vitality of 80% of Subsaharan languages, this estimate is too high.

Loading Comment Data...

Posted 2 years, 3 months ago by Karin Wiecha

17th Edition of the Ethnologue

The Ethnologue is a comprehensive language catalogue which is used as a reference work by linguists all over the world. It was published for the first time in 1951 by The Summer Institute of Linguistics (SIL) and provides information for all known living languages and languages that have become extinct after 1951. The Ethnologue provides statistical data on the world's languages including native speaker populations, literacy rates, regions where the languages are spoken, an assessment of their vitality and other basic information. This data is very useful as a reference point for language projects of all kinds. The set of data as a whole is important infrastructure that is also used by the Rosetta Project. Some of the language metadata in the Rosetta Collection at the Internet Archive, like the three-letter language identifier codes, are taken from the Ethnologue. The 17th edition of the Ethnologue has just been released online where it is browsable not only for linguists and researchers but for anyone interested in the languages of the world.


The Ethnologue is updated with a new edition approximately every four years to represent our best knowledge about the languages of the world. Altogether the new edition features nearly 60,000 updates and corrections and with each new edition the database is not only updated but also expanded. The 17th edition provides statistics for 7,105 known languages, adding 196 languages to the previous edition. Still this huge database makes no claims of completeness.

Where do all these new languages come from? Determining what constitutes a distinct language is not a straightforward task. Sometimes what we thought were dialects of a single language might get reclassified as separate languages, if it turns out that they are not mutually intelligible. Cultural identities and politics can also occasionally play a role in deciding where to draw the line. Determining whether a language is extinct can be an equally difficult task. In the new edition of the Ethnologue 188 languages have been reclassified from extinct to “dormant”, because they still have a symbolic value for their former speech community and offer the potential for revitalization or may be actively being revitalized. From time to time previously unknown languages are also discovered, as in the very recent announcement of Hawai’i Sign Language. Researchers report these findings to the constantly growing database of the Ethnologue.

With the new Ethnologue edition the website was also given a new, more interactive design which allows you to browse languages not only via the search function but also by clicking on a world map. For many countries there are language maps available that show in which regions certain languages are spoken. Two other new features that might be interesting for language enthusiasts are the Ethnoblog and the Language of the Day Feature. Every day a language is highlighted on the website with a link to its individual language page. The language pages provide the most important information on each, individual language, including the language status and its position in the language cloud - two new metrics in this version of the Ethnologue.

The language status is measured with the Expanded Graded Intergenerational Disruption Scale (EGIDS), which assigns each language a level of endangerment between 0 - International (e.g. English) to 10 - Extinct. This scale is an expansion of the eight-level GIDS-scale developed by linguist Joshua Fishman in 1991. GIDS was developed to determine the vitality of endangered languages, while EGIDS is applicable to all languages, including world languages and extinct languages, which makes it possible to assign a status to each language of the Ethnologue’s comprehensive database.

The language cloud is a visualization of the vitality of the world’s languages. It combines the EGIDS scale with the number of first language speakers of a given language to position its status of endangerment with respect to all other languages in the world. Each of the 7,105 languages listed in the Ethnologue is represented by a dot. Languages that have a lot of native speakers and are widely used are positioned in the upper left corner while the languages in the lower right corner are extinct or severely endangered languages with a very small number of speakers if any. Every language page features a version of the language cloud with the language’s individual position highlighted (see image).


Mindiri(a language of Papua New Guinea and Language of the Day for March 20, 02013) in the language cloud

The Ethnologue also provides the ISO-codes for all the listed languages. ISO 639 is an internationally recognized coding system of languages. SIL has been the official Registration Authority of the third and most extensive version of the code set, known as ISO 639-3, since 2007. Language names alone do not suffice as uniquie identifiers for any given language since some languages have multiple names and then again other language names are used for a number of languages. The ISO-codes ensure that every language is identifiable by its individual three-letter code.

An interesting side note: there is an ISO-code for not only each of the known living, but also extinct and constructed languages. Esperanto is an artificial language, but has 2 million speakers world wide according to the Ethnologue. The ISO-code for Esperanto is epo. Klingon, another constructed language, might not have as many speakers, but there is an ISO-code for it: tlh. Old English is not included in the Ethnologue because it died out centuries ago, but it still has an ISO-code (ang).

Do you know the ISO-code for the language or languages you speak? Why don’t you look it up in the new edition of the Ethnologue!

Loading Comment Data...

Posted 2 years, 3 months ago by Karin Wiecha

Linguists Discover Existence of Distinct Hawaiian Sign Language

On Sunday linguists announced the discovery of a previously undocumented indigenous sign language at the University of Hawai’i. This is the first time a new language - spoken or signed - has been discovered in the USA since the 1930s! The language, they found, has been in use since at least the 1820s, but only few knew of its existence. Thanks to Linda Lambrecht, a committed native user of the language, Hawai’i Sign Language (HSL), as the language is now officially called, has been brought to the attention of the wider public for the first time in its history. Lambrecht, who is an American Sign Language (ASL) instructor at Kapi‘olani Community College, grew up with HSL as a first language and had been advocating the use and preservation of it since the 1980s. With the launching of a HSL language documentation project, funded by the Hawai‘i Council for the Humanities and a number of other academic institutions, her work finally came to fruition.

In order to determine whether HSL is an independent language rather than a dialect of ASL, the researchers interviewed 21 native HSL signers on four of the Hawaiian islands. They found that eighty percent of the basic vocabulary differs from ASL, which makes the two languages mutually unintelligible and proves that HSL is a distinct language entirely unrelated to ASL. An analysis of the grammar has also confirmed that HSL is a full-fledged language rather than an unstable pidgin.

The “discovery” of HSL came just in time. Even though it used to be the native sign language of Hawai’is Deaf community in the 19th and early 20th centuries, it had been gradually displaced by ASL from the 1940s on. By the 1950s ASL was the dominant sign language in Hawai’i. Today there are only about a hundred Hawaiians left who know the language, most of them over sixty years old. One aim of the research project is to use the documented data for a dictionary, textbooks and HSL classes. This way, the researchers hope, Hawai’i Sign Language can be preserved and saved from dying out with the last generation of its native signers.

Loading Comment Data...

Posted 2 years, 4 months ago by Karin Wiecha

Happy International Mother Language Day 02013!

Today mother tongues will be celebrated world wide. This date was chosen by UNESCO in recognition of the Bengali language movement, where on February 21, 01952, students protested for their language to become an official national language. Several protesters taking part in the demonstration were killed by police. The celebration of International Mother Language Day reminds us of the importance of linguistic diversity and the human right to use one’s mother tongue, no matter how few speakers it might have, to be preserved and passed on to future generations.

The theme of this year’s International Mother Language Day is Books for Mother Tongue Education. This theme highlights the importance of mother tongue education for the survival of linguistic diversity. For a large number of languages there are no books or teaching materials. But with a majority language being the language of instruction at school, children of minority language speech communities have little chance to become literate in their mother tongue. Also many young speakers are prone to switch to a globally more dominant language when they realize that the use of their mother tongue does not allow them to take part in all walks of modern life. Mother tongue education is an important step towards preserving the world’s language diversity for the future.

Today and in the coming days people all over the globe are celebrating this diversity in a variety of events. Do you want to help raise awareness of the importance of linguistic diversity? You could help The Long Now Foundation's PanLex Project translate “mother tongue” in as many languages as possible. You could also print the official International Mother Language Day 02013 poster and hang it at school or work. For more ideas on how to get involved, visit the UNESCO's website.

Loading Comment Data...

Posted 2 years, 5 months ago by Kelsey Westphal

Dream of the Universal Translator: Closer to Reality?

Speech recognition software is everywhere—businesses use it to streamline customer phone calls, digital dictation software allows you to speak emails and essays, and, most recently, the iPhone’s surprisingly cheeky Siri can call, text, or look up information online with just a few verbal commands. With the aid of Deep Neural Networks, a mathematical technique patterned after human brain behavior, researchers at the University of Toronto and Microsoft Research have found a way to increase the accuracy of speech recognition to around 85%. This complex and relatively new technology is promising on its own, but when integrated with advanced translation software, has been used to produce a prototype of what could one day become a simultaneous personal translator, not unlike the iconic Universal Translator of Star Trek.

Though not mounted on a communicator pin or ready to communicate with aliens, this technology is still highly advanced, with multiple steps. First, the original speech is translated word-for-word into the second language. Next, the translated words are rearranged into grammatically appropriate phrases in the target language. The resulting translation is then spoken, not in the stilted, metallic voice of a computer, but in your own voice! To do this, an hour or so of recordings of your voice and that of a native speaker’s of the target language are necessary in order to preserve the speakers vocal identity while also creating comprehensible expressions in another language.

There are still some kinks to work out, of course, but the possibilities this suggests for overcoming language boundaries are worth thinking about. Conversations between cultures could become more balanced: neither party would feel as though they were “imposing” their language on the other, and both could speak in the tongue they find most amenable. In diplomacy, business, travel and the arts, this new translation tool could produce profound breakthroughs in communication and more importantly, understanding between cultures and people. As anyone who has used a translation site knows, computer generated translations can often go comically awry, and this program certainly runs the same risk of miscommunication as any other. All the same, the thought of hearing your own voice in another language is a bizarre and fascinating prospect, one that will hopefully attract researchers and language lovers alike to search for solutions.

One only hopes that this technology will be adapted not only to serve speakers of Chinese or French but also of lesser known languages. One positive development on this front is Microsoft’s adaptation of Haitian Creole and the Hmong language for its Bing translation service. This slow but thorough aggregation of diverse languages will ideally make it so that eventually no language community, however small, is left without a voice in global discourse — even if it is a computer generated one.

If you would like to see a video of this process in action, check out the video above of Microsoft's Chief Research Officer Rick Rashid speaking in English to a Chinese audience.

Loading Comment Data...

Posted 2 years, 7 months ago by Susan Colowick

Fun with Unicode

Most of us don't give much thought to character encodings. As our Web browsers move effortlessly between Arabic and Cyrillic pages, we may not remember the bad old days, when conflicts among dozens of standards made it very likely that a document or Web page would appear as utter gibberish. Some of those old encodings are still in use, but most of today's browsers, Web sites, and applications comply with the Unicode Standard and its encoding forms (UTF-8 being the most popular).

Unicode_Heiroglyphs.jpg Some Unicode Symbols for Egyptian Hieroglyphs

The name Unicode embodies the three original goals of the standard: universality (encompassing all human languages), uniformity (using fixed-width codes), and the uniqueness of each character representation. In Unicode a unique number, or code point, is assigned to each of thousands of characters in dozens of scripts. Development of the standard began in 1987; today it's maintained and promoted by the Unicode Consortium, a nonprofit whose members include Apple, Google, Microsoft, and every other major tech company (Long Now's PanLex project is an associate member).

You can see all the Unicode characters and symbols by browsing the code charts, or you can use one of several nifty tools developed for exploring Unicode. Here are just a couple:

  • Unicode Utilities. This interface to the Unicode database will tell you more than you ever wanted to know about the character at each code point. Click on "character" to type or paste any character and get a full list of its properties. Click on "confusables" to see which characters can accidentally (or maliciously) be confused with others that look similar.

  • UniView. Developed by Richard Ishida (International­ization Activity Lead at the World Wide Web Consortium), this app allows you to search or browse for any character and discover all its properties. You can search by code point, character, or the name or description of a character. Search for the word "tilde" and you'll get 110 characters; search for "chess" and you'll get symbols for all the chess pieces, in white and black (♔, ♞, etc.). A "lite" version of the app is meant for mobile devices but may also be less intimidating to a new user than the full-featured version.

Next time you translate some text online and then paste it into a document, take a moment to thank the hard-working developers and maintainers of the Unicode Standard. They've spent 25 years thinking about character encodings so that you don't have to.

Loading Comment Data...

Posted 2 years, 7 months ago by Richard Makin

Cherokee Becomes Google's 57th Language

Cherokee has become the first Native American language to be fully supported by Gmail. As the 57th Google interface language, Cherokee can now be used to compose emails as well as perform web searches. The news reveals another exciting collaboration between a large technology company and members of the Cherokee Language Technology Department, who also worked with Apple in 2010 to develop full Cherokee language support for the iPhone, iPod and iPad.

Cherokee makes use of a unique writing system which was developed in 1821 by a member of the Cherokee Nation named Sequoyah. Although reminiscent in style of some Latin, Greek and Cyrillic scripts, each of the 85 Cherokee characters indicates a syllable of speech rather than an individual sound. For instance, the word “Cherokee”, is composed of the three characters “ᏣᎳᎩ”, each representing the syllables “tsa” “la” and “gi” respectively. Upon official adoption by the Cherokee nation in 1825, the writing system spread rapidly across disparate Cherokee territories.

Although currently reported to be spoken by around 16,000 people, a 2002 survey by the Cherokee Nation revealed that fluency in the language is exclusive to those over 40 years old. Connecting Google and Cherokee directly addresses the tribe’s younger generation by making the language both relevant and useable for everyday tasks such as sending an email. The language is not only being preserved, but also promoted, modernized and made accessible.

Cherokee Gmail

Loading Comment Data...

Posted 2 years, 7 months ago by Laura Welcher

Presents for Polyglots!

Levenger has just announced a set of gifts in their holiday catalog sure to please the multilingually-minded. Rosetta helped with the concept of one of these gifts - a set of multilingual learning blocks. The set of blocks is based on the "Swadesh List" - a set of basic vocabulary words that are found in most languages, because they have to do with basic human experience - our families, our bodies, and our natural environment. Each of the 28 blocks has a different word in commonly taught languages: Spanish, simple Mandarin, French, German, Latin and English.


Levenger is also rolling out several other multilingual gifts for the holidays, including this beautiful set of Cherokee Syllabary blocks:


You can read about some of the other gifts, and Levenger CEO Steve Leveen's interest in promoting multilingualism in this post on his blog.

We are delighted that Levenger is a supporter of the Rosetta Project.

<< Older

Recent Comments

Powered by Disqus