300 Languages: A Parallel Speech Corpus Project

The 300 Languages Project is a special effort by The Rosetta Project, part of The Long Now Foundation, to begin the construction of a universal corpus of human language by collecting parallel text and audio in the world's 300 most widely-spoken languages. The resulting collection will contain thousands of volunteer-contributed public domain text documents and audio recordings which will be made available to researchers and the public alike via The Internet Archive, a free online digital library. More...

The 300 Languages Project will accept submissions of any document in any language. Languages and document types other than those listed will be added directly to The Rosetta Project's collection at the Internet Archive.

Questions? Email laine@longnow.org.

Fifty to ninety percent of the world's languages are predicted to disappear in the next century, many with little or no significant documentation.