Loading Comment Data...
Posted 3 months ago by Austin Brown

The Berkeley Language Center will be hosting a talk by Long Now’s Dr. Laura Welcher on November 9th. The talk is open to the public and starts at 3:00pm in Dwinelle Hall B-4.
The Rosetta Project at The Long Now Foundation is working to build an open public digital collection of all human language as well as an analog backup that can last for thousands of years–The Rosetta Disk. In the “long now,” the goal is long-term storage and access to information–on the scale that both supports and transcends individual human societies and civilizations. In the “here and now,” the project serves to support and amplify the importance of the world’s nearly 7,000 human languages, the vast majority of which are endangered and, if current trends continue, likely to go extinct in the next 100 years. I’ll present our current work on the Rosetta Project Collection and Disk as well as some new initiatives including the “Language Commons” where we are working to help build the multilingual Web.
There will be a reception afterwards; come say Hello.
Loading Comment Data...
Posted 3 months, 3 weeks ago by Austin Brown

With thousands of languages and writing systems used all over the world, making computers and the web widely accessible has taken a herculean effort, with much yet to be done.
One of the main tools used in the expansion of the web’s global reach is Unicode - a database of over 193,000 characters from 93 different writing systems and the standards for using and representing them.
Unicode is maintained by The Unicode Consortium, which sponsors a conference each year to share knowledge and discuss the future of Unicode.
This year the Internationalization and Unicode Conference will be held October 17th - 19th in Santa Clara, CA.
Long Now’s Dr. Laura Welcher will be delivering a keynote presentation on Tuesday October 18th of her work on The Rosetta Project, a publicly accessible digital library of human languages, and The Language Commons:
The Rosetta Project shares the Unicode vision of a world where people can use communication technology on their own terms - in their own language.
According to World Internet Statistics, over 80% of all web communication is in about ten languages, with over half in either English or Chinese. The remaining 20% represent "everyone else" including about 400 languages with speaker populations above 1 million, which collectively comprise about 95% of everyone on earth.
Because of essential technologies like Unicode, we are poised to see this breadth of human languages flourish online and on mobile devices, providing for these languages a critical new domain of language use in the modern world. I will present several efforts underway at The Rosetta Project including the "Language Commons" that rely on Unicode as an essential technology in building the multilingual Web.
Loading Comment Data...
Posted 5 months ago by Laura Welcher
On July 30, 02011 The Rosetta Project partnered with Mightyverse.com to hold the first human language Record-a-thon at the Internet Archive. This is an event we developed to test the idea that with a few basic guidelines, anyone can use common video devices to help document human language.
The idea is that by creating a 5-10 minute unedited video, and providing basic information about it - essentially just saying what language you think it is in - and then uploading it to the Rosetta Project collection in the Internet Archive, you are helping build a corpus of valuable data for that language. You don't need to be a specialist, and by archiving it you create a resource that others can build on, for many different useful purposes - from language learning and teaching, to linguistic analysis, to building the tools that enable a language to be used with modern technology.
This introductory talk by Dr. Laura Welcher, made the morning of the event, describes the ideas behind the creation of the Record-a-thon:
Participants were then given a set of basic guidelines for creating and uploading language videos. If you are interested in reviewing them, or trying it out for yourself, the guidelines are all available here on our website.
In the course of a single day, both in-person and remote partipants combined created about 85 videos in 34 different languages. There were speakers of all ages, native and non-native, some quite fluent while others were learners practicing their skills. All the videos they created are interesting to watch and are available here in the Rosetta Project video collection. They recorded conversations, told stories, histories, and jokes, recited poems, and sang lullabies. Here is a sampling (click on the images to see the videos):
- Chihota speaking his native Shona. Shona is a language of Zimbabwe with about 11 million speakers. Chihota took home one of the Record-a-thon prizes, having made recordings of himself speaking Shona, Swahili, Sheng (an emergent Swahili-English mixed language) and Chilapalapa (a pidgin that emerged in the mines of South Africa). He also speaks fluent English and Russian. Chihota was unsure that we would consider all of these languages but we assured him we were interested in them all:
- Arturo Avila speaking his native Mixteco Bajo from Oaxaca, Mexico. Mixtec languages comprise a cluster of about 50 related languages in Mexico, having anywhere from a few hundred to a few thousand speakers each. Mr. Avila was the lucky Record-a-thon raffle winner of an iPad 2 (participants were given raffle tickets for each recording they uploaded, and Mr. Avila upload a bunch!):
- Anita Suter speaking her native language Swiss German, in the Ostschweizer dialect. Standard German is one of the official languages of Switzerland, along with French, Italian and Romansch. Swiss German, with approximately 6.5 million speakers is the spoken variety of German used daily in Switzerland, and it has many dialects, many of which are unintelligible with each other. These dialects are used alongside Standard German, a spoken and written variety which is reserved for more official purposes, in a peaceful linguistic co-habitation known as 'diglossia':
- Jordan Brown speaking Yiddish, a language he is studying. Several of the Record-a-thon participants made recordings in languages they are learning or studying. Mr. Brown, a linguistics student and Rosetta Project summer intern, made recordings in both Yiddish as well as in the unrelated Sri Lankan language Sinhala. Here he reads from the Yiddish translation of "Winnie the Pooh" by A.A. Milne. Yiddish is a Germanic language with about 2 million first language speakers and 11 million second language speakers in Israel, Germany, and worldwide:
During the Record-a-thon there were also several Mightyverse Phrase Farm recording stations set up and running all day, where participants could record vocabulary lists, as well as the Universal Declaration of Human Rights. These video files are more complex, but as soon as the files are processed, we hope to make them available at the Internet Archive as well:
"
Other highlights of the day included a keynote speaker by Dr. Elizabeth Lindsey. Dr. Lindsey is an Explorer at the National Geographic, and she inspired us with stories of her experiences on her current expedition to visit and document traditional knowledge-keepers around the world.
Thanks to all of our participants, and to our sponsors The Internet Archive, The Levenger Foundation and Levenger.com, The Long Now Foundation, and to our team of dedicated Rosetta Project Interns and volunteers, without all of whom this event would not have been possible.
We heart human languages - all of them!
Loading Comment Data...
Posted 5 months, 1 week ago by Summer Dougherty
The Rosetta Project's newest addition to its online database is set of language recordings assembled by the famous ethnomusicologist Alan Lomax. This collection encompasses approximately 600 recordings of dozens of languages from around the world. The recordings were made primarily in the 60's and 70's by Alan Lomax and by linguists around the world to serve as raw material in Lomax's Parlametrics project, a "comparative study of conversational style." [1] Recordings include children singing in Puluwatese, family conversations in Telegu and stories and songs in Woleaian.
Though Lomax made some of the recordings himself , notably many of the ones made in the USSR, Italy and England, the rest were made by linguists around the world who helped Lomax by sending him tapes of their own field recordings. As Lomax had requested, the recordings consist mostly of five minute long snippets of conversation in various languages along with some telling of stories myths and singing of songs. Through a collaboration with the Association for Cultural Equity, the recordings were loaned to the Rosetta Project with the stipulation that the recordings be digitized. In 2005, Rosetta intern JD Ross Leahy digitized the vast majority of the recordings, approximately 270 reel-to-reel and cassette tapes, and the originals were sent to the Library of Congress for long term archiving. In 2011, Rosetta intern Summer Dougherty transcribed notes, inventoried, organized, and prepared the digital material for upload and in July 2011 the recordings were uploaded to the Internet Archive.
Ethnomusicologist and activist Alan Lomax is famous for his recordings of blues legends including Lead Belly, jazz musicians including Jelly Roll Morton and folk singers including Woody Guthrie. [2]
As a teenager, Lomax started helping his father, folklorist and musicologist John Lomax, collect folk songs. Lomax and his father partnered with the Library of Congress and by 1930, when Alan was 15, they had already contributed over 3,000 recordings to the library's collection. [3] Lomax's role as a microphone for under appreciated and marginalized folk singers brought folk music back into the attention of the public and spurred the folk revival in America, inspiring a new generation of artists, including Bob Dylan. Even British music was affected by Lomax: the Rolling Stones take their name from one of Muddy Waters' songs. [4] Even more recently, Lomax's recording of James Carter and other prisoners singing "Po' Lazarus" was used in the film "O Brother, Where Art Thou?". Other songs have been featured in “The Gangs of New York” and “Moby’s Play”. [5]
Lomax felt that folk music is vital expression of culture, and culture was very important to him. He believed in what he called "cultural equity", "the idea that the expressive traditions of all local and ethnic cultures should be equally valued as representative of the multiple forms of human adaptation on earth." [6] In fact, "his desire to document, preserve, recognize, and foster the distinctive voices of oral tradition led him to establish the Association for Cultural Equity (ACE), based in New York City and now directed by his daughter, Anna Lomax Wood." [7] "After 1960 he devoted himself to comparative research on world music and dance with collaborators from musicology, anthropology, dance, and linguistics." [8] These projects included his study of song, Choreometrics, of dance, Cantometrics and of speech, Parlametrics.
References:
[1] Parlametrics (Association for Cultural Equity)
[2], [6], [8] Alan Lomax (Association for Cultural Equity)
[3] Alan Lomax, Who Raised Voice Of Folk Music in U.S., Dies at 87 (New York Times)
[4], [5] The Man who Recorded the World (Folkradio)
[7] The American Folklife Center: Alan Lomax Collection (The Library of Congress)
Loading Comment Data...
Posted 6 months, 1 week ago by
Join us for the Record-a-thon this Saturday July 30 at the Internet Archive and help document and promote the languages used in your own community! We need your help to meet our goal of recording 50 languages in a single day! How many languages can you help us document? Bring yourself and your multilingual friends and be the stars of your own grassroots language documentation project!
Keynote Speaker: Dr. Elisabeth Lindsey, National Geographic

Plan to attend in-person or remotely?
(Tickets are free - your RSVP will allow us to prepare for numbers to expect and what equipment is going to be present, whether you intend to come in person or if you’re participating remotely.)

Read more...
Loading Comment Data...
Posted 7 months, 2 weeks ago by Laura Welcher
RECORD-A-THON
Help us record 50 languages in a single day!
Save the date! Saturday July 30, 02011 from 9 am to 6 pm
The Internet Archive
at 300 Funston Avenue, San Francisco
Did you know...
There is something you can do to help document and promote the languages used in your own community! We need your help to meet our goal of recording 50 languages in a single day! How many languages can you help us document? Bring yourself and your multilingual friends and be the stars of your own grassroots language documentation project!
Professional linguists and videographers will be on site to document you and your friends speaking word lists, reading texts, and telling stories. You can also document your language using tools you probably have in your purse or back pocket — a mobile phone, digital camera, or laptop — just bring your device and our team will guide you through the documentation process.
How do your words and stories make a difference? An important part of language documentation is building a corpus — creating collections of vocabulary words, as well as conversations and stories that demonstrate language in use. From a corpus, linguists and speech technologists can build grammars, dictionaries, and tools that enable a language to be used online. The bigger the corpus, the better the tools!
The recordings you make during the event will be added to The Rosetta Project's open collection of all human language in The Internet Archive. And, you can compete for cool prizes, including an iPad 2 for the participant who records and uploads the most languages during the event!
Please RSVP below and let us know if you plan to attend, and what language or languages you are thinking of recording. Can't make it to the Record-a-thon? Join us online the day of the event for the virtual Record-a-thon, where you'll be able to interact with event staff, monitor event progress, listen live to lectures and talks, and submit your own recordings remotely.
We will be in touch soon with more information about the day's events, and how you can participate! For questions or more information please contact rosetta@longnow.org.
Read more...
Loading Comment Data...
Posted 8 months ago by Colin Farlow
Ellen Bialystok, a research professor of psychology at York University in Toronto, claims a polyglot child develops cognitive efficiency from constantly speaking more than one language: "[t]he constant necessity to resist attending to a second language in favor of the one in use, and the need to switch between languages demands more effortful attention than does monolingual speech production, and this greater cognitive demand fosters the development of a higher level of attentional control." [1]
This affect appears to help stave off the symptoms of Alzheimers. In Bialystok’s study individuals with Alzheimers who had equal levels of outward symptoms were compared. The study essentially shows that people who regularly speak more than one language can perform certain cognitive tasks with significantly less amount of functioning brain matter than can someone who only speaks one language. It seems that bilingualism delays the onset of outward symptoms associated with Alzheimers; provided everything else is equal, those who have the disease and are bilingual will still suffer from brain deterioration, but their symptoms will be less severe. In this sense, bilingualism serves some protection against the effects of Alzheimers.
In a recent New York Times article about her research Bialystok explains, "[t]here’s a system in your brain, the executive control system. It’s a general manager. Its job is to keep you focused on what is relevant, while ignoring distractions. It’s what makes it possible for you to hold two different things in your mind at one time and switch between them. If you have two languages and you use them regularly, the way the brain’s networks work is that every time you speak, both languages pop up and the executive control system has to sort through everything and attend to what’s relevant in the moment. Therefore the bilinguals use that system more, and it’s that regular use that makes that system more efficient."
The claim that bilingualism can actually be advantageous is significant, because in the past bilingualism was generally regarded as a liability. Bialystok notes, "until about the 1960s, the conventional wisdom was that bilingualism was a disadvantage. Some of this was xenophobia. Thanks to science, we now know that the opposite is true." [2] Bialystok describes questions posed to her about which language should be taught to children whose parents speak more than one language, “People e-mail me and say, “I’m getting married to someone from another culture, what should we do with the children?” I always say, “You’re sitting on a potential gift.”
[1] Schweizer TA, et al., Bilingualism as a contributor to cognitive reserve: Evidence From brain atrophy in Alzheimer's disease, Cortex (2011), doi:10.1016/j.cortex.2011.04.009 (available at [www.sciencedirect.com](http://www.sciencedirect.com "Science Direct"))
The author of this post, Colin Farlow, is a 02011 summer intern with the Rosetta Project. He recently graduated from Indiana University, where he studied East Asian Languages and Cultures and Philosophy.
Read more...
Loading Comment Data...
Posted 8 months ago by Harry Willoughby

Busuu, a language of Cameroon, is reported to have only eight speakers left in the world.
Speakers of Busuu have nearly all shifted to using another local language Jukun, which has about 2,500 speakers. Jukun and Busuu are related, but are only partially intelligible with each other. Jukun is used by Busuu speakers for almost all purposes, Busuu generally being reserved for use only at Busuu reunions, and only by adults - no children are learning the language. For all intents and purposes, Busuu appears to be a lost cause, destined to disappear from use with the passing of its current generation of speakers.
Yet despite this (or perhaps because of this) it has been adopted as a cause by the eponymous language learning website busuu.com, and awareness of the language’s plight is being spread through the medium of a professionally-produced video with a catchy song featuring the few remaining (but all apparently charming and good-humored) Busuu speakers. The Busuu.com website encourages people to spread awareness about Busuu through Facebook, Twitter and e-cards by sending recorded greetings from each of the remaining speakers. A clever and unusual tactic in raising the profile of an endangered language - but is increased awareness among online social networks likely to translate into increased use within the Busuu heritage speech community?
There are in fact a few notable cases where languages in rapid decline have been reversed, and threatened languages have significantly expanded in use and numbers of speakers - among these are Catalan, and Welsh and Hawaiian. For languages that have only a few remaining speakers like Busuu, Leanne Hinton, a linguist who works with critically endangered languages of Native California in the United States, has devised a technique whereby speakers and learners can create their own immersion environments for language learning. [1] These speakers then teach others the language, including their children. In this way a language can be passed along on a very localized level to a new generation.
To bring a language back into more widespread – even national – use, the key factor is support from every direction possible – top down from the government; bottom up from local communities. [2] Everyone needs to be invested, from governor to grandma, and real world-benefits to potential speakers is key. [3] So if this effort by busuu.com to raise social awareness is effective, and generates broad recognition for the Busuu speech community, the resulting increase of local prestige for the Busuu Language could be significant indeed.
We're certainly willing to give it a try - Busuu Busuu!
[1] Hinton, Leanne. 2002. How to Keep Your Language Alive. Heyday Books.
[2] Crystal, David. 2000. Language Death. Cambridge University Press.
[3] Fishman, J.A. (ed.) 2001. Can Threatened Languages Be Saved? Reversing Language Shift, Revisited: A 21st Century Perspective. Clevedon: Multilingual Matters.
The author of this post, Harry Willoughby, is a 02011 summer intern with the Rosetta Project. He recently graduated from the University of Wales with a degree in Linguistics.
Loading Comment Data...
Posted 8 months, 1 week ago by Colin Farlow
In a new study published in the journal Language and Cognition “When Time is Not Space,” a team of researchers from University of Portsmouth and Federal University of Rondonia claim that the Amondawa, a small Amazonian tribe, speak a language with a very uncommon conceptualization of time. The story was recently picked up by BBC, revealing that the debate about whether language influences thought is very much alive and newsworthy.
According to researchers Sinha et al., the Amondawa have no words for talking abstractly about time (as in the English word 'time'), or time periods (like 'year'):
“What we don't find is a notion of time as being independent of the events which are occurring; they don't have a notion of time which is something the events occur in.”
The mapping of time to physical space is commonly found in human language, and its absence in Amondawa is perhaps the most surprising result of the study. Rather than having a time-space metaphor, the Amondawa conceptualization of time is based on “social activity, kinship and ecological regularity.”
Pierre Pica, a theoretical linguist at France’s National Centre for Scientific Research, question the conclusions derived from this new research. Pica explains that just because Amondawa does not use cardinal chronology, does not mean they view themselves advancing through time any differently than the rest of us who use a cardinal chronological system.
Sinha et al. state that the tribe’s language in no way affects their cognitive ability to grasp temporal concepts -- they talk about events, and sequences of events, and learn Portuguese which does have abstract time expressions. Rather, the Amondawa language provides a different way of construing and talking about temporal concepts in daily life.
This contention about whether the Amondawa language affects its speakers’ thought processes hearkens back to a famous study by Benjamin Lee Whorf on the Hopi Language in the first half of the 20th century. This study was a foundational example for Whorf’s “linguistic relativity hypothesis” – the idea that the language you speak influences the way you think. From his study of Hopi, Whorf concluded:
“The Hopi language is seen to contain no words, grammatical forms, constructions or expressions that refer directly to what we call TIME, or to past, present or future, or to enduring or lasting…the Hopi language contains no reference to TIME, either explicit or implicit.” [1]
Whorf’s ideas about Hopi have received a great deal of criticism over the years, and his data was critiqued as erroneous evidence resulting from deficient research practices. [2] Nevertheless, the idea that language influences thought has certainly stuck around, and is now being raised by a new generation of researchers like Sinha et al who are gathering new data from small and threatened languages around the world.
For more on the relationship of language and thought, listen to our podcasts of previous Long Now seminars by Lera Boroditsky as well as Daniel Everett who talks about Pirahã, a language also from the Amazon.
[1] Whorf, Benjamin Lee. 1950. An American Indian Model of the Universe. The
International Journal of American Linguistics 16(2).
[2] In an interview by BBC, Guy Deutscher explains his ideas about language and thought in addition to describing Benjamin Whorf’s research on Hopi Language.
The author of this post, Colin Farlow, is a 02011 summer intern with the Rosetta Project. He recently graduated from Indiana University, where he studied East Asian Languages and Cultures and Philosophy.
Read more...
Loading Comment Data...
Posted 8 months, 4 weeks ago by Alex Mensing

How does human language work? What are its possibilities and limitations? Where did it come from? Many linguists have asked these questions and made contributions to our understanding of language, but how do they get their answers?
One approach is to go out and document a language, which can then be compared to other languages, writings from the past, etc. Through various methods, linguists have succeeded in discovering patterns within and between languages that allow us to define some of their parameters and to organize them into families. But, as two recent publications demonstrate, our ability to recognize patterns—and their underlying causes—may be dramatically increasing with the development of technology that can centralize, organize and manipulate enormous amounts of information.
The two studies were highlighted in The Economist, and both of them offer conclusions that are likely to spark lively debate. Dr. Michael Dunn, from the Netherlands’ Max Planck Institute for Psycholinguistics, published a paper in Nature magazine addressing word-order dependencies—the idea that, for example, if a given language places verbs before objects (eat lunch) it will also place prepositions before nouns (at home). By comparing different languages, linguists have found that there are some strong consistencies in these dependencies, indicating that they are the result of “underlying cognitive or systems biases.” Dr. Dunn, however, has used large databases of basic vocabularies and statistical methods borrowed from evolutionary biology to approach the problem of dependencies in a different way:
To substitute for fossils, and thus reconstruct the ancient branches of the tree as well as the modern-day leaves, Dr Dunn used mathematically informed guesswork. The maths in question is called the Markov chain Monte Carlo (MCMC) method. As its name suggests, this spins the software equivalent of a roulette wheel to generate a random tree, then examines how snugly the branches of that tree fit the modern foliage. It then spins the wheel again, to tweak the first tree ever so slightly, at random. If the new tree is a better fit for the leaves, it is taken as the starting point for the next spin. If not, the process takes a step back to the previous best fit. The wheel whirrs millions of times until such random tweaking has no discernible effect on the outcome.
When Dr Dunn fed the languages he had chosen into the MCMC casino, the result was several hundred equally probable family trees. Next, he threw eight grammatical features, all related to word order, into the mix, and ran the game again.
He found that particular word-order traits were not necessarily linked to others in the way that current theories propose. Rather, such dependencies seemed to be ‘lineage-specific,’ suggesting that they have been passed down through language families. “Nurture, in other words, rather than nature,” as The Economist put it.
The other article, published in Science by Dr. Quentin Atkinson of the University of Auckland, also uses statistics and databases in an innovative way. He looked at information from the World Atlas of Language Structures on sounds in different languages and found that phonemic diversity (the number of sounds used in a language) decreases as you follow the pathways of human migration outwards from central/southern Africa. The Science article argues that modern language originated in that part of Africa and that phonemic diversity decreased with every stage of human expansion as small groups of people set off in search of new territory.
Both of these studies utilize phylogenetic language groupings, based on evolutionary theory, and they run statistical analyses with large amounts of data made available by central repositories of linguistic information, such as the World Atlas of Language Structures. The Long Now Foundation’s Rosetta Project is an effort to improve and facilitate that very sort of creative methodology—to organize and make available large amounts of data so that researchers can develop fundamentally new methods of inquiry.
<< Older