<tiger2/> - An XML serialization of the SynAF syntactic model
What is <tiger2/>?
<tiger2/> is a standard conformant XML format modelled to express syntactic structures for a wide variety of theoretical formalisms and corpus architectures. It is closely related to and develops the ideas found in TigerXML: the declared goal of the project is to expand TigerXML only as much as required for the representation of current advanced syntactic resources, without any changes that are not strictly necessary and might increase the learning curve or require substantial alterations to existing tools. The format is similarly conceived as theory neutral, as it is suited to both shallow and deep parsing in any number of theories and supports both pure constituency and dependency trees, as well as combinations of the two.
<tiger2/>is extensible to any number of new annotation forms for both syntactic nodes and edges through a typing system and gives users the ability to define custom annotations that apply to only certain kinds of nodes and edges.
What can <tiger2/> do?
Main features of <tiger2/>, especially as compare with the original TigerXML (a.k.a. tiger1) include:
- Constituency and dependency trees: terminal nodes can be connected by annotated edges or joined to a non-terminal node. Both constituency and dependency analyses can be represented simultaneously in the same graph.
- Freely typed nodes and edges: users can define different types of edges (primary/secondary edges, dependency edges, coreference etc.) and nodes (compounds, phrasal verbs etc.) and assign different annotation tagsets to each
- Binding to the ISOCat DCR: the annotation declaration can be linked to categories in the ISOCat Data Category Registry
- Standoff and inline architectures: users can represent syntactic annotation either in separate files, apart from the raw text and/or further annotaton layers (following LAF, ISO ISO/DIS 24612), or all in one file (inline).
- Interface for subtokens: by interacting with MAF, the Morphological Annotation Framework, documents can build syntactic structures on top of subtokenized word forms (e.g. compounds, which are analyzed below the word level), as defined by MAF (see the Data Model and Examples sections)
For a complete description see the Releases on this site. A more detailed overview of differences between <tiger2/> and the original TigerXML see tiger1 > tiger2.
SynAF - the data model behind <tiger2/>
As a standard conformant format, <tiger2/> is a serialization of the data model defined by SynAF, the Syntactic Annotation Framework developed by the International Organization for Standardisation (ISO). SynAF is a standard developed in ISO/TC37/SC4 (Language Resources Management), which covers the basic semantics and structure of syntactic annotation. As a concrete XML format, <tiger2/> instantiates the SynAF model elements as XML schemas (including XSDs, RNG and ODD descriptions) which can be downloaded from the Resources page.
<tiger2/> is also meant to interface with other serializations of ISO linguistic annotation formats, in particular with MAF, the Morphological Annotation Framework. Through the interface to MAF, combined SynAF/MAF annotated documents can be generated, which address modeling problems both above and below the word form (e.g. subtokenization, compounds etc.).
Who is developing <tiger2/> and how can I contribute?
<tiger2/> is being developed at Humboldt University in Berlin, Georgetown University and INRIA by Laurent Romary, Amir Zeldes and Florian Zipser, in cooperation with the Institute for Natural Language Processing at the University of Stuttgart, the Department of Linguistics at the University of Tübingen and other members of D-Spin, the German chapter of the European CLARIN project.
At the moment, the format specification is still being actively changed based on feedback from contributors. Please let us know if you have any ideas for further features or comments on current elements at:
tiger2@lists.hu-berlin.de.Current Timeline:
Milestone | Date |
---|---|
Presentation of <tiger2/> to D-Spin forum members | 29.6.2010 |
First round of feedback due | 20.7.2010 |
Release candidate draft due | 24.8.2010 |
Public announcement of <tiger2/> at KONVENS2010 | 7.9.2010 |
Release of version 2.0.3 | 25.5.2011 |
Release of version 2.0.5 | 12.12.2011 |
DIN (NA 105-00-06 AA) Proposal for ISO SynAF2 | 4.6.2012 |
Successful ISO NP ballot for SynAF2/tiger2 ISO NWIP 24615-2! | 10.10.2012 |
Presentation at Treebanks and Linguistic Theories (TLT) 11 in Lisabon | 1.12.2012 |
Once the format is finalized, the API will be released and all further comments may be reserved for the development of further versions in the future.