300 Languages: A Parallel Speech Corpus Project

The 300 Languages Project is a special effort to begin the construction of a universal corpus of human language by collecting parallel text and audio in the world's 300 most widely-spoken languages. The resulting collection will contain thousands of volunteer-contributed public domain text documents and audio recordings which will be made available to researchers and the public alike via The Internet Archive, a free online digital library. More...

Make a New Translation  Record an Existing Text

The 300 Languages Project will accept submissions of any document in any language. Languages and document types other than those listed will be added directly to The Rosetta Project's collection at the Internet Archive.

Follow our progress on the world map, updated each time we get a new submission, or check out a list of contributors on the People page.

Questions? Email 300@rosettaproject.org.

300 Languages

The Rosetta Disk

Fifty to ninety percent of the world's languages are predicted to disappear in the next century, many with little or no significant documentation.