SaltNPepper

Overview

With SaltNPepper we provide two powerful frameworks for dealing with linguistic annotated data. SaltNPepper is an Open Source project developed at the Humboldt University of Berlin (see: http://www.hu-berlin.de/) and INRIA (Institut national de recherche en informatique et automatique, see: http://www.inria.fr/) as well. In linguistic research a variety of formats exists, but no unified way of processing them. To fill that gap, we developed a meta model called Salt which abstracts over linguistic data. Based on this model, we also developed the plugable universal converter framework Pepper to convert linguistic data between various formats.

In the last few years, many tools for annotating and interpreting linguistic data have been developed. Most of these tools have been developed for very specific purposes. For instance, tools have been developed for specific kinds of linguistic annotations, for a specific theoretical school or just for a specific corpus. Often such specific tools come with a specific and more or less proprietary format optimized for the purpose of the tool. In our own research projects we also employ expert tools, for which we often require interoperability to deal with the same basic data in various tools. For several years now it has become more and more interesting to combine several annotation levels to search for linguistic phenomena in a set of levels at once. Creating such a corpus, requires annotating the same corpus data with several tools. But because of the missing interoperability and because of the more or less proprietary formats that most tools are based on, converting between formats becomes an interesting challenge.

In order to solve this problem, we have developed the meta model Salt as an intermediate data model to represent linguistic data and relations between them coming from a wide range of differing formats. Salt is based on a general graph structure and treats linguistic data as sets of nodes and edges. Therefore it is highly usable in very different contexts of linguistic analysis. Salt is independent of specific linguistic theories or schools and corpora.
To convert data between different formats, we have developed the universal converter framework Pepper, which is based on Salt. Pepper is a container controlling the workflow of a conversion process, the conversion itself is done by a set of modules called PepperModules mapping the linguistic data between a given format and Salt and vice versa. Pepper is a highly plugable framework which offers the possibility to plug in new modules in order to incorporate further formats. The architecture of Pepper is flexible and makes it possible to benefit from all already existing modules. This means that when adding a new or previously unknown format Z to Pepper, it is automatically possible to map data between Z and all already supported formats A,B, C, … . A Pepper workflow consists of three phases:

  1. the import phase (mapping data from a given fromat to Salt),
  2. the optional manipulation phase (manipulating or enhancing data in Salt) and the
  3. export phase (mapping data from Salt to a given format).

The three phase process makes it feasible to influence and manipulate data during conversion, for example by adding additional information or linguistic annotations, or by merging data from different sources.

The following figure illustrates the general architecture of SaltNPepper in more detail, with its three components: 1) Salt, 2) Pepper and the 3) PepperModules.

architecture of SaltNPepper

Here you can find the project pages of the sub projects of SaltNPepper providing a closer description of their aims:

Salt a linguistic meta model
Pepper a universal linguistic converter framework based on Salt

Download and User Guides

Since SaltNPepper is not just a monolithic converter program and contains the frameworks Salt and Pepper and a lot of plugins called PepperModules as well, we provide a bundled version of SaltNPepper here. This bundle contains all necessary modules to run the converter. Just download the zip file and extract it. A user's guide on how to run Pepper is also given concerning the specific version.

Latest Stable Release

Download version 2013.07.27
This version is based on Pepper 1.1.6.

download latest stable release

User's Guide

Current Snapshot Version

download a snapshot release

User's Guide

Older Versions

Download version 2013.04.17
This version is based on Pepper 1.1.5.

download latest stable release

User's Guide

Download version 2012.05.14
This version is based on Pepper 1.1.3.

download latest stable release

User's Guide

Download version 2010.08.04
This version is based on Pepper 1.0.0.

download

User's Guide

Download version 2009.07.27
This version is based on Pepper 0.0.9.

download

User's Guide

Individual Download

To download the frameworks Salt and Pepper or PepperModules individually, please use the link to the download page of the specific project.

Salt download the linguistic meta model Salt
Pepper download the linguistic converter framework Pepper

Official Pepper modules

Since Pepper is just a converter framework and only takes control of the conversion workflow, the real conversion work is done by a set of Pepper modules. Such a module is an individual unit executing a specific task, like mapping data from or to a linguistic data format. A Pepper module can simply be plugged into the Pepper framework by copying its bundled version to a specific folder.

The workflow is separated into three different phases:

  • the import phase,
  • the optional manipulation phase and
  • the export phase.

The import phase handles the mapping from a format X to the Salt model, the export phase handles the mapping from the Salt model to a format Y. During the optional manipulation phase, the data in a Salt model can be enhanced, reduced or manipulated.

A phase is divided into several steps: the import and export phase each contain 1 to n steps whereas the manipulation phase contains 0 to n steps (i.e. it is optional). Each Pepper module realizes such a step and therefore is associated with exactly one phase. The usage of Pepper modules is determined by the Pepper workflow description file. A module can be identified by specifying its coordinates (its name or its supported formats and the supported format versions).

GroupId ArtifactId module-types short-description project-site
de.hu-berlin.german.korpling.saltnpepper pepperModules-CoNLLModules importer reads data coming from the CoNLL format CoNLLModules
de.hu-berlin.german.korpling.saltnpepper pepperModules-DoNothingModules importer, exporter importer and exporter do nothing, the exporter can be useful to check data consistency of other imports DoNothingModules
de.hu-berlin.german.korpling.saltnpepper pepperModules-DOTModules exporter, manipulator exports data into the DOT format DOTModules
de.hu-berlin.german.korpling.saltnpepper pepperModules-ELANModules importer The project ELANModules provides an importer for the Pepper converter framework, to map data coming from ELAN to Salt. ELANModules
de.hu-berlin.german.korpling.saltnpepper pepperModules-EXMARaLDAModules importer, exporter reads and writes data coming from or to EXMARaLDA's basic transcription format (.exb) EXMARaLDAModules
de.hu-berlin.german.korpling.saltnpepper pepperModules-FALKOModules manipulator manipulates a Salt model of the FALKO corpus FALKOModules
de.hu-berlin.german.korpling.saltnpepper pepperModules-GenericXMLModules importer Imports data coming from any XML file to a Salt model in a customizable but generic way. GenericXMLModules
de.hu-berlin.german.korpling.saltnpepper pepperModules-GrAFModules importer, exporter imports and exports a Salt model to the GrAF format GrAFModules
de.hu-berlin.german.korpling.saltnpepper pepperModules-NLPModules manipulator This project provides a Pepper module to tokenize the sText of all STextualDS objects of all SDocument objects. NLPModules
de.hu-berlin.german.korpling.saltnpepper pepperModules-PAULAModules importer, exporter imports and exports a Salt model to the PAULA format PAULAModules
de.hu-berlin.german.korpling.saltnpepper pepperModules-RelANNISModules importer, exporter imports and exports a Salt model to the relANNIS format RelANNISModules
de.hu-berlin.german.korpling.saltnpepper pepperModules-RSTModules importer imports RST data into a Salt model RSTModules
de.hu-berlin.german.korpling.saltnpepper pepperModules-SaltXMLModules importer, exporter imports and exports a Salt model to the SaltXML format SaltXMLModules
de.hu-berlin.german.korpling.saltnpepper pepperModules-TigerModules importer imports data coming from the TigerXML format into a Salt model TigerModules
de.hu-berlin.german.korpling.saltnpepper pepperModules-TreeTaggerModules importer, exporter reads and writes data coming from or to TreeTagger/ cwb format (.tab) TreeTaggerModules
de.hu-berlin.german.korpling.saltnpepper pepperModules-TueBaDZModules manipulator manipulates a Salt model containing hybrid syntax/topological trees in the TueBa-D/Z scheme TueBaDZModules
de.hu-berlin.german.korpling.saltnpepper pepperModules-UAMModules importer imports data coming from the UAM format into a Salt model UAMModules
edu.tufts.perseus pepperModules-PerseusModules importer reads data coming from the format used by the Perseus project PerseusModules

Plugin Pepper Modules

If you want to install a new Pepper module, which is not contained in the official release, just unzip the archive into Pepper's plug-in directory (PEPPER_HOME/plugins by default). If you want to update an already existing module, you'll need to remove the older version from Pepper's plug-in directory first by deleting the corresponding .jar file and the folder having the same name.

Creating your own Pepper Modules

If the given list doesn't contain a module handling the format you are working with, you are free to create your own module and plug it into the Pepper framework. In many cases, you won't be the only one using the format you created a module for. In case you want to let other people benefit from your code, we can help you by making your project public and adding it into the official list provided here.

If the provided official Pepper modules do not fullfill your needs, you are free to create your own module and plug it into the Pepper framework. A more detailed description of how to create your own module can be found in the following list corresponding to the Pepper version. In many cases, you won't be the only one using the format you created a module for. In case you want to let other people benefit from your code, we can help you by making your project public and adding it into the official list provided here. If you want to join our infrastructure, you are welcome to start an official Pepper modules project.
We can support you and your project by:

  • providing a repository (subversion)
  • providing a project management system (Jenkins)
  • providing a project site (redmine)
  • providing a ticket system (redmine)

Please write an email to:

Pepper module developer's guide

Pepper 1.1.7
ModuleDevelopersGuide

Pepper 1.1.6
ModuleDevelopersGuide

Bugs and Feature Requests

We invite you to give us feedback. Bug reports are welcome as well as requests for further features. You can send us an eMail to:

email icon

Or you can directly use our ticket system. To do so please sign in on the project homepage (the one you are currently visiting). If you do not have an account yet, please use the register button at the upper right corner. After signing in, please use the tab 'New issue' and describe your feature or bug report. If you want to, you can follow your ticket to be notified about ongoing progress.

Join the Project

Since Salt is an open source project you are welcome to join and contribute. Please write an eMail to:

email icon

Partner Projects

Projects using the Salt meta model or the converter framework Pepper within their software solutions:

ANNIS http://www.sfb632.uni-potsdam.de/annis/
ATOMIC http://www.personal.uni-jena.de/~mu65qev/LinkType/
Perseus http://www.perseus.tufts.edu/hopper/
<tiger2/> http://korpling.german.hu-berlin.de/tiger2/homepage/index.html

Publications

  • Zipser F., Zeldes A., Ritz J., Romary L. & Leser U. (2011)
    Pepper: Handling a multiverse of formats
    33. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft. Göttingen, 23.- 25. Februar 2011.
  • Zipser F., Romary L. (2010)
    A model oriented approach to the mapping of annotation formats using standards In: Proceedings of the Workshop on Language Resource and Language Technology Standards, LREC 2010.
  • Zipser F. (2009)
    Entwicklung eines Konverterframeworks für linguistisch annotierte Daten auf Basis eines gemeinsamen (Meta-)modells. Diplomarbeit, Humboldt-Universität zu Berlin, Institut für Informatik.

Funders

The SaltNPepper project was funded by the Humboldt university of Berlin (see: http://www.hu-berlin.de/), the SFB 632 (see: www.sfb632.uni-potsdam.de/) and INRIA (see: www.inria.fr/).

Humboldt university of Berlin SFB 632 INRIA

download.png - download icon (1.5 kB) Florian Zipser, 01/26/2012 08:43 pm

email.png - email icon (3.8 kB) Florian Zipser, 01/26/2012 08:43 pm

documentation.png - documentation icon (873 Bytes) Florian Zipser, 01/26/2012 08:43 pm

svn.png - svn logo (1.1 kB) Florian Zipser, 01/27/2012 03:06 pm

maven.png - maven logo (478 Bytes) Florian Zipser, 02/02/2012 11:10 am

SaltNPepper_architecture.png - architecture of SaltNPepper (60.3 kB) Florian Zipser, 02/02/2012 01:01 pm

logo_HU.jpg - logo of the Humboldt University of Berlin (6 kB) Florian Zipser, 02/02/2012 01:03 pm

logo_SFB.jpg - logo of the SFB 632 (6.9 kB) Florian Zipser, 02/02/2012 01:05 pm

logo_INRIA.jpg - logo of inria (7.9 kB) Florian Zipser, 02/02/2012 01:05 pm