The goal of this project is to explore the possibility of extending (an open source suite of office productivity software) by adding interlinear glossing capabilities to its basic functionality. This extension will allow linguists to annotate texts for certain kinds of grammatical information and to link the words and morphemes in those texts to a lexical database, thus permitting them to build a lexicon and a collection of annotated texts in a basic working environment that is already likely to be quite familiar—namely, that of a modern word processor. Furthermore, since’s native document format, OpenDocument Format, is a variety of XML, resources produced using this system will already be in a form which facilitates archiving and resource interoperation (in keeping with the recommendations of E-MELD’s School of Best Practices).

The outputs of this project can be summarized as follows:

  • Tools: This project will develop a preliminary version of a “plug-in” for use with to facilitate the creation and maintenance of interlinear glossed texts. It will also develop code libraries allowing the interlinear glossed text to be “linked” to entries in a lexical database.
  • Standards: This project will develop XML standards for interlinear glossed text which will be suitable for use in the environment. It will further develop a standard mechanism within to link lexical items in texts to lexical database entries.
  • Recommendations: This project will propose recommendations concerning (i) the general problem of adapting pre-existing open-source applications to more narrow academic functions and (ii) the more specific problem of doing this with

No project we are aware of has attempted to build a linguistics-specific toolkit within an existing office suite, even though this would clearly be desirable from the standpoint of the ordinary working linguist who will typically already make extensive use of such a suite in their workflow. At the conclusion of the project, we will be in a position to determine the general feasibility of using as a general platform for linguistic data manipulation tools and, if the results of our preliminary research are promising, to expand the project’s scope.


Completion of primary research and programming by June, 2008.


  • Jeff Good, Principal Investigator
  • Laura Welcher, Director of The Rosetta Project
  • Juan Pablo Puerta, Lead Programmer
  • Jeremy Fahringer, Programmer
  • Stuart Robinson, Consultant

The Rosetta Disk

Fifty to ninety percent of the world's languages are predicted to disappear in the next century, many with little or no significant documentation.