Posted on Thursday, March 8, 02012 by Laura Welcher
While globalization is usually considered a primary factor in language endangerment, global economies also provide access to inexpensive communication technologies like the internet and mobile devices - and these technologies are increasingly enlisted as tools to increase the use of endangered languages, as reported recently in the BBC News.
Many endangered language speech communities are gravitating towards Twitter, as well as social media services like Facebook, to promote language use and language learning. For children especially, the ability to use their heritage language with these ubiquitous social media sites provides an essential "coolness" factor, giving their languages relevance and an important new domain of use in the modern world.
Those who use smaller languages on public sites like blogs, or Twitter, are creating an additional resource that they are probably unaware of: the language that they craft and post helps build a text corpus for their language that can pave the way for better tools to enhance that language's use online.
Dr. Kevin Scannell, a computer scientist, mathematician and endangered language speaker, has created a multilingual web crawler called An Crúbadán (which literally means "crawler" in Irish). The crawler identifies and computes the probability of 3-character sequences, which provide a unique "fingerprint" for any given written language. Here is an image showing the catch he netted with a recent trawl: over 1,000 different languages being used online (click on the image below for more information):
According to Scannell, the identified ever-growing corpora provide a means "to develop basic resources that help people use their language online: keyboard input methods, spell checkers, online dictionaries, and so on." In his other research, he has developed crawlers that explicitly capture endangered language Tweets (Indigenous Tweets) as well as blogs (Indigenous Blogs) which he says "aim to strengthen languages through social media."