KOMeT (Korpuslinguistische Methoden für ePhilologie mit TEI) is a researcher group project at the Department of Corpus Linguistics in the Institute for German Language and Linguistics, Humboldt-Universität zu Berlin. The group is funded by the German Federal Ministry of Education and Research (BMBF).
The project aims to apply computational corpus linguistics methods to ancient texts encoded in TEI XML, focusing initially on richly annotated corpora of Sahidic Coptic, the language of Hellenistic Ancient Egypt and the early Coptic church.
We aim to bring together researchers in any area dealing with textual resources from the ancient world, including linguistics, Coptology/Egyptology, history of religion, classical studies and more. The pilot phase in 2014 is dedicated exclusively to Coptic, with a possibility to expand to other ancient languages later on.
KOMeT is developing an annotation format for extending ready-made TEI XML projects with standoff annotation based on PAULA XML. For many purposes, TEI XML documents need to be enriched with further linguistic annotations that are not covered in an appropriate way by the TEI standard. Yet at the same time we wish to avoid editing the TEI document or converting it to another format, which would prevent the creators of the corpus or other researchers from extending the document using TEI based tools. We therefore offer a standard for annotating TEI documents from the outside, using separate external XML files (standoff annotation).
- For documentation on KOMeT standoff see: KOMeT_Standoff_XML_Documentation.pdf
- For more information on PAULA XML, see the PAULA website.
- A demo corpus for Coptic is also available in KOMeT standoff (see below)
The demo corpus for KOMeT annotation standards is called Besa.letters, taken from two letters by Besa, the 5th century Abbot of Atripe. It contains the following letters:
- Letter to Aphthonia
- Letter to Thieving Nuns
Both documents are available in KOMeT standoff, using EpiDoc TEI for the structural markup and adding additional linguistic annotations using standoff XML. We are grateful to the project Coptic SCRIPTORIUM for collaborating on the tools to digitize and annotate these documents. Long term archival of the corpus is also planned together with the project LAUDATIO at Humboldt-Universität zu Berlin.
- Project lead: Prof. Amir Zeldes (link)
- Research Assistant: André Röhrig
MentorsThe following mentors supervise the project:
- Prof. Frank Kammerzell (link)
- Prof. Anke Lüdeling (link)
- Dr. Laurent Romary (link)
- Prof. Caroline T. Schroeder (link)
- ANNIS: An open source search and visualization platform for multilayer corpora
- SaltNPepper: An open source converter platform for language data and multiple annotation formats
- Coptic SCRIPTORIUM: A project producing freely available, richly annotated corpora of Sahidic Coptic
- LAUDATIO: Long-term Access and Usage of Deeply Annotated Information
Events and Talks
- 8.4.2014 - Presentation and demo at Topoi Methods Colloquium on Digital Text Annotation in Berlin (Zeldes/Röhrig)
- 15.5.2014 - "Digital Coptic. Building an Online Environment for the Study of Coptic Literature". Presentation at UC Berkeley, Center for Tebtunis Papyri. (Schroeder/Zeldes)
- 8-12.7.2014 - "Digitizing the Dead and Dismembered: DH Technologies for the Study of Coptic Texts". Paper at the Digital Humanities 2014 Conference in Lausanne (Schroeder/Zeldes).
I have moved to Georgetown University. You can contact me at the address below: