Post archive for September 2010 - The Rosetta Project

The Rosetta Blog > Post archive for September 2010

15 years, 8 months ago by Laura Welcher

Rosetta Disk at the Hammer Museum for an "Enormous Microscopic Evening"
Join Long Now's Rosetta Project on November 4 from 4 - 7 pm at UCLA's Hammer Museum where we team up with San Francisco-based CRITTER for an Enormous Microscopic Evening. We'll put a Rosetta Disk under the microscope, check out the fine (and finer) print, and maybe hunt for Easter eggs... More information on the evening's lineup from the Hammer Museum:

Enormous Microscopic Evening examines the museum from a microscopic perspective with CRITTER, a San Francisco-based salon dedicated to expanding the relationships between culture and the environment. The evening will focus on demonstrations and workshops about building and manipulating microscopes. Materials and samples taken from around the museum will be examined. Continuing the theme of microscopy, there will be micro performances (short concerts with tiny instruments) and other related events throughout the museum.
15 years, 8 months ago by Laura Welcher

Linguist awarded prestigious MacArthur Fellowship
Jessie Little Doe Baird, a linguist who has worked for years on reviving the Wampanoag (Wôpanâak) Language, has just been awarded a 02010 MacArthur "Genius" Fellowship in honor of her work and research.
Baird, who is of Wamponoag heritage, studied at MIT under the indigenous language scholar Kenneth Hale. By immersing herself in the language, she has achieved fluency, effectively reviving in herself the spoken use of the long-silent language. Her research is focused on developing a dictionary of Wampanoag, which now includes nearly 10,000 words, as well as language teaching resources, through which she hopes to help usher the language into modern use in the Wampanoag community.
15 years, 8 months ago by Laine Stranahan

Be a Pilot Tester for The 300 Languages Project

The 300 Languages Project is a special effort by The Rosetta Project to create a parallel text and audio corpus for the world's 300 most widely-spoken languages. We are seeking a limited set of volunteers to test its submission process and offer feedback to its coordinators before the project is globally launched in November. Native speakers of any language (including English) are encouraged to participate.

To participate, sign up here or email laine@longnow.org.
15 years, 8 months ago by Laine Stranahan

Swadesh List data now re-enabled in Rosetta Internet Archive Collection

Swadesh list for the Puoc language in the International Phonetic Alphabet

In the 01950s, American linguist Morris Swadesh, as part of his overarching vision of a quantitative method for determining language relationships on a global and multimillenial scale, developed a set of one hundred words found to be unusually stable across time and language boundaries. Swadesh hypothesized that words like "fire," "moon," "mother" and "bone," common to human experience, were far less likely to change or be substituted with words borrowed from other dialects or languages. The 100 word "Swadesh list" (sometimes up to 207, depending on the variety of the list used) is now widely collected in linguistic field research, and functions as a kind of universal linguistic fossil. With careful study, these lists can reveal ancient language relationships and processes of linguistic change typically obscured by centuries-long processes of evolution and borrowing. As familiar examples, such processes transformed Chaucer's English into modern English and Latin into the modern Romance Languages.

In 02004, The Rosetta Project undertook a National Science Foundation funded project to increase both the size and utility of its long-term multilingual archive and at this time added a large number of Swadesh lists to its collection. Lexical database archivists Tim Usher and Paul Whitehouse contributed original research (Tim Usher's 02002 Indo-Pacific database and Paul Whitehouse's 02002 Australian and New Guinea database were central among the additions) and also brought in outside resources, including Darrell Tryon's Comparative Austronesian Dictionary (01995), George Starostin's Dravidian database, and Ilya Peiros' Mon Khmer database. In many of these cases, as with the Usher and Whitehouse collection, the 100-200 term Swadesh lists were a subset of a larger lexical data collection project. Despite the Swadesh list's limitation in size compared with a resource like a dictionary, a large collection of the same material in many different languages is useful as a parallel dataset for cross-linguistic comparison.

This collection of Swadesh lists was included as a parallel data set among the documents micro-etched on the Rosetta Disk, a physical copy of The Rosetta Project's long-term linguistic archive created in 02008. And for a period of time, the lists were available on The Rosetta Project's website via an interactive tool which allowed visitors to view and compare lexical items in over a thousand languages and also contribute their own lexical data. But as the Rosetta Project site evolved and the structure of serving environments changed, this tool became technologically obsolete. While there was (and remains) no lack of storage space for the lists, there was a critical lack of what Long Now board member Kevin Kelly calls "movage."

"Movage," says Kelly, "means transferring the material to current platforms on a regular basis — that is, before the old platform completely dies, and it becomes hard to do. This movic rhythm of refreshing content should be as smooth as a respiratory cycle — in, out, in, out. Copy, move, copy, move." And it is movage, not storage, says Kelly, that is critical to keeping information alive: "The only way to archive digital information is to keep it moving." In other words, simply storing data isn't enough to ensure its longevity; it must be copied, moved, and made redundant. And not just once or twice — indefinitely. Kurt Bollacker, Long Now Foundation Digital Research Director, adds: "[b]ecause any single piece of digital media tends to have a relatively short lifetime, we will have to make copies far more often than has been historically required of analog media. Like species in nature, a copy of data that is more easily “reproduced” before it dies makes the data more likely to survive." [1]

Since the 02004 iteration of the Swadesh list program, The Rosetta Project has launched a comprehensive migration of all of its data to The Internet Archive, a free online digital library founded in 01996 with over 4 petabytes of storage. The Internet Archive exemplifies the paradigm shift in the field of information preservation from storage to movage: users of the site can upload any document they have permission to distribute to the site for free, where anyone with access to the internet can then download it to their own machine. Thousands of downloads are made every day from Internet Archive servers by users all over the world: early "movage" on a massive scale.

After a long process of unraveling and decoding the Swadesh list data, which had fallen victim to rapid changes in character encoding and database standards, The Rosetta Project has now moved the collection of 1,235 Swadesh lists into The Internet Archive. Recognizing the substantial merit and long-term advantages of the movage model and its successful early implementation by The Internet Archive, our goal is for the lists to have a long, useful, and redundant residence there.

The relocation of the Swadesh lists is also the first step of The Rosetta Project's latest undertaking, The 300 Languages Project. Source materials collected for The 300 Languages Project, whose aim is to address a need for highly-structured linguistic resources in the world's 300 most widely-spoken languages, will be stored at The Internet Archive with the rest of The Rosetta Project collection.

Was the 5-to-6-year period the Swadesh list data spent in the darkness unusual? According to Kelly, not at all: "We don’t know what the natural movage respiration cycle is for digital media yet since it is still very new," says Kelly, "but I suspect the cycle is much shorter than we think. I would guess it is 5 years. No matter what digital format you have your precious [data] stored on, you should expect to move it onto new media in five years — and five years after that forever!"

The Rosetta Blog > Post archive for September 2010

Rosetta Disk at the Hammer Museum for an "Enormous Microscopic Evening"

Linguist awarded prestigious MacArthur Fellowship

Be a Pilot Tester for The 300 Languages Project

To participate, sign up here or email laine@longnow.org.

Swadesh List data now re-enabled in Rosetta Internet Archive Collection

Blog Categories

Recent Comments