INRIA HU Korpuslinguistik

<tiger2/> - An XML serialization of the SynAF syntactic model

What is <tiger2/>?

<tiger2/> is a standard conformant XML format modelled to express syntactic structures for a wide variety of theoretical formalisms and corpus architectures. It is closely related to and develops the ideas found in TigerXML: the declared goal of the project is to expand TigerXML only as much as required for the representation of current advanced syntactic resources, without any changes that are not strictly necessary and might increase the learning curve or require substantial alterations to existing tools. The format is similarly conceived as theory neutral, as it is suited to both shallow and deep parsing in any number of theories and supports both pure constituency and dependency trees, as well as combinations of the two.

<tiger2/>is extensible to any number of new annotation forms for both syntactic nodes and edges through a typing system and gives users the ability to define custom annotations that apply to only certain kinds of nodes and edges.

What can <tiger2/> do?

Main features of <tiger2/>, especially as compare with the original TigerXML (a.k.a. tiger1) include:

For a complete description see the Releases on this site. A more detailed overview of differences between <tiger2/> and the original TigerXML see tiger1 > tiger2.

SynAF - the data model behind <tiger2/>

As a standard conformant format, <tiger2/> is a serialization of the data model defined by SynAF, the Syntactic Annotation Framework developed by the International Organization for Standardisation (ISO). SynAF is a standard developed in ISO/TC37/SC4 (Language Resources Management), which covers the basic semantics and structure of syntactic annotation. As a concrete XML format, <tiger2/> instantiates the SynAF model elements as XML schemas (including XSDs, RNG and ODD descriptions) which can be downloaded from the Resources page.

<tiger2/> is also meant to interface with other serializations of ISO linguistic annotation formats, in particular with MAF, the Morphological Annotation Framework. Through the interface to MAF, combined SynAF/MAF annotated documents can be generated, which address modeling problems both above and below the word form (e.g. subtokenization, compounds etc.).

Who is developing <tiger2/> and how can I contribute?

<tiger2/> is being developed at Humboldt University in Berlin, Georgetown University and INRIA by Laurent Romary, Amir Zeldes and Florian Zipser, in cooperation with the Institute for Natural Language Processing at the University of Stuttgart, the Department of Linguistics at the University of Tübingen and other members of D-Spin, the German chapter of the European CLARIN project.

At the moment, the format specification is still being actively changed based on feedback from contributors. Please let us know if you have any ideas for further features or comments on current elements at:

tiger2@lists.hu-berlin.de.

Current Timeline:

Milestone Date
Presentation of <tiger2/> to D-Spin forum members 29.6.2010
First round of feedback due 20.7.2010
Release candidate draft due 24.8.2010
Public announcement of <tiger2/> at KONVENS2010 7.9.2010
Release of version 2.0.3 25.5.2011
Release of version 2.0.5 12.12.2011
DIN (NA 105-00-06 AA) Proposal for ISO SynAF2 4.6.2012
Successful ISO NP ballot for SynAF2/tiger2 ISO NWIP 24615-2! 10.10.2012
Presentation at Treebanks and Linguistic Theories (TLT) 11 in Lisabon 1.12.2012

Once the format is finalized, the API will be released and all further comments may be reserved for the development of further versions in the future.