- Download and User Guides
- Official Pepper modules
- Plugin Pepper Modules
- Creating your own Pepper Modules
- Bugs and Feature Requests
- Join the Project
- Partner Projects
With SaltNPepper we provide two powerful frameworks for dealing with linguistic annotated data. SaltNPepper is an Open Source project developed at the Humboldt University of Berlin (see: http://www.hu-berlin.de/) and INRIA (Institut national de recherche en informatique et automatique, see: http://www.inria.fr/) as well. In linguistic research a variety of formats exists, but no unified way of processing them. To fill that gap, we developed a meta model called Salt which abstracts over linguistic data. Based on this model, we also developed the plugable universal converter framework Pepper to convert linguistic data between various formats.
In the last few years, many tools for annotating and interpreting linguistic data have been developed. Most of these tools have been developed for very specific purposes. For instance, tools have been developed for specific kinds of linguistic annotations, for a specific theoretical school or just for a specific corpus. Often such specific tools come with a specific and more or less proprietary format optimized for the purpose of the tool. In our own research projects we also employ expert tools, for which we often require interoperability to deal with the same basic data in various tools. For several years now it has become more and more interesting to combine several annotation levels to search for linguistic phenomena in a set of levels at once. Creating such a corpus, requires annotating the same corpus data with several tools. But because of the missing interoperability and because of the more or less proprietary formats that most tools are based on, converting between formats becomes an interesting challenge.
In order to solve this problem, we have developed the meta model Salt as an intermediate data model to represent linguistic data and relations between them coming from a wide range of differing formats. Salt is based on a general graph structure and treats linguistic data as sets of nodes and edges. Therefore it is highly usable in very different contexts of linguistic analysis. Salt is independent of specific linguistic theories or schools and corpora.
To convert data between different formats, we have developed the universal converter framework Pepper, which is based on Salt. Pepper is a container controlling the workflow of a conversion process, the conversion itself is done by a set of modules called PepperModules mapping the linguistic data between a given format and Salt and vice versa. Pepper is a highly plugable framework which offers the possibility to plug in new modules in order to incorporate further formats. The architecture of Pepper is flexible and makes it possible to benefit from all already existing modules. This means that when adding a new or previously unknown format Z to Pepper, it is automatically possible to map data between Z and all already supported formats A,B, C, … . A Pepper workflow consists of three phases:
- the import phase (mapping data from a given fromat to Salt),
- the optional manipulation phase (manipulating or enhancing data in Salt) and the
- export phase (mapping data from Salt to a given format).
The three phase process makes it feasible to influence and manipulate data during conversion, for example by adding additional information or linguistic annotations, or by merging data from different sources.
The following figure illustrates the general architecture of SaltNPepper in more detail, with its three components: 1) Salt, 2) Pepper and the 3) PepperModules.
Here you can find the project pages of the sub projects of SaltNPepper providing a closer description of their aims:
|Salt||a linguistic meta model|
|Pepper||a universal linguistic converter framework based on Salt|
Download and User Guides¶
Since SaltNPepper is not just a monolithic converter program and contains the frameworks Salt and Pepper and a lot of plugins called PepperModules as well, we provide a bundled version of SaltNPepper here. This bundle contains all necessary modules to run the converter. Just download the zip file and extract it. A user's guide on how to run Pepper is also given concerning the specific version.
Latest Stable Release¶
Download version 2013.07.27
This version is based on Pepper 1.1.6.
Current Snapshot Version¶
Download version 2013.04.17
This version is based on Pepper 1.1.5.
Download version 2012.05.14
This version is based on Pepper 1.1.3.
This version is based on Pepper 1.0.0.
This version is based on Pepper 0.0.9.
To download the frameworks Salt and Pepper or PepperModules individually, please use the link to the download page of the specific project.
|Salt||download the linguistic meta model Salt|
|Pepper||download the linguistic converter framework Pepper|
Official Pepper modules¶
Since Pepper is just a converter framework and only takes control of the conversion workflow, the real conversion work is done by a set of Pepper modules. Such a module is an individual unit executing a specific task, like mapping data from or to a linguistic data format. A Pepper module can simply be plugged into the Pepper framework by copying its bundled version to a specific folder.
The workflow is separated into three different phases:
- the import phase,
- the optional manipulation phase and
- the export phase.
The import phase handles the mapping from a format X to the Salt model, the export phase handles the mapping from the Salt model to a format Y. During the optional manipulation phase, the data in a Salt model can be enhanced, reduced or manipulated.
A phase is divided into several steps: the import and export phase each contain 1 to n steps whereas the manipulation phase contains 0 to n steps (i.e. it is optional). Each Pepper module realizes such a step and therefore is associated with exactly one phase. The usage of Pepper modules is determined by the Pepper workflow description file. A module can be identified by specifying its coordinates (its name or its supported formats and the supported format versions).
|de.hu-berlin.german.korpling.saltnpepper||pepperModules-CoNLLModules||importer||reads data coming from the CoNLL format||CoNLLModules|
|de.hu-berlin.german.korpling.saltnpepper||pepperModules-DoNothingModules||importer, exporter||importer and exporter do nothing, the exporter can be useful to check data consistency of other imports||DoNothingModules|
|de.hu-berlin.german.korpling.saltnpepper||pepperModules-DOTModules||exporter, manipulator||exports data into the DOT format||DOTModules|
|de.hu-berlin.german.korpling.saltnpepper||pepperModules-ELANModules||importer||The project ELANModules provides an importer for the Pepper converter framework, to map data coming from ELAN to Salt.||ELANModules|
|de.hu-berlin.german.korpling.saltnpepper||pepperModules-EXMARaLDAModules||importer, exporter||reads and writes data coming from or to EXMARaLDA's basic transcription format (.exb)||EXMARaLDAModules|
|de.hu-berlin.german.korpling.saltnpepper||pepperModules-FALKOModules||manipulator||manipulates a Salt model of the FALKO corpus||FALKOModules|
|de.hu-berlin.german.korpling.saltnpepper||pepperModules-GenericXMLModules||importer||Imports data coming from any XML file to a Salt model in a customizable but generic way.||GenericXMLModules|
|de.hu-berlin.german.korpling.saltnpepper||pepperModules-GrAFModules||importer, exporter||imports and exports a Salt model to the GrAF format||GrAFModules|
|de.hu-berlin.german.korpling.saltnpepper||pepperModules-NLPModules||manipulator||This project provides a Pepper module to tokenize the sText of all STextualDS objects of all SDocument objects.||NLPModules|
|de.hu-berlin.german.korpling.saltnpepper||pepperModules-PAULAModules||importer, exporter||imports and exports a Salt model to the PAULA format||PAULAModules|
|de.hu-berlin.german.korpling.saltnpepper||pepperModules-RelANNISModules||importer, exporter||imports and exports a Salt model to the relANNIS format||RelANNISModules|
|de.hu-berlin.german.korpling.saltnpepper||pepperModules-RSTModules||importer||imports RST data into a Salt model||RSTModules|
|de.hu-berlin.german.korpling.saltnpepper||pepperModules-SaltXMLModules||importer, exporter||imports and exports a Salt model to the SaltXML format||SaltXMLModules|
|de.hu-berlin.german.korpling.saltnpepper||pepperModules-TigerModules||importer||imports data coming from the TigerXML format into a Salt model||TigerModules|
|de.hu-berlin.german.korpling.saltnpepper||pepperModules-TreeTaggerModules||importer, exporter||reads and writes data coming from or to TreeTagger/ cwb format (.tab)||TreeTaggerModules|
|de.hu-berlin.german.korpling.saltnpepper||pepperModules-TueBaDZModules||manipulator||manipulates a Salt model containing hybrid syntax/topological trees in the TueBa-D/Z scheme||TueBaDZModules|
|de.hu-berlin.german.korpling.saltnpepper||pepperModules-UAMModules||importer||imports data coming from the UAM format into a Salt model||UAMModules|
|edu.tufts.perseus||pepperModules-PerseusModules||importer||reads data coming from the format used by the Perseus project||PerseusModules|
Plugin Pepper Modules¶
If you want to install a new Pepper module, which is not contained in the official release, just unzip the archive into Pepper's plug-in directory (
PEPPER_HOME/plugins by default). If you want to update an already existing module, you'll need to remove the older version from Pepper's plug-in directory first by deleting the corresponding
.jar file and the folder having the same name.
Creating your own Pepper Modules¶
If the given list doesn't contain a module handling the format you are working with, you are free to create your own module and plug it into the Pepper framework. In many cases, you won't be the only one using the format you created a module for. In case you want to let other people benefit from your code, we can help you by making your project public and adding it into the official list provided here.
If the provided official Pepper modules do not fullfill your needs, you are free to create your own module and plug it into the Pepper framework. A more detailed description of how to create your own module can be found in the following list corresponding to the Pepper version. In many cases, you won't be the only one using the format you created a module for. In case you want to let other people benefit from your code, we can help you by making your project public and adding it into the official list provided here. If you want to join our infrastructure, you are welcome to start an official Pepper modules project.
We can support you and your project by:
- providing a repository (subversion)
- providing a project management system (Jenkins)
- providing a project site (redmine)
- providing a ticket system (redmine)
Please write an email to:
Pepper module developer's guide¶
Bugs and Feature Requests¶
We invite you to give us feedback. Bug reports are welcome as well as requests for further features. You can send us an eMail to:
Or you can directly use our ticket system. To do so please sign in on the project homepage (the one you are currently visiting). If you do not have an account yet, please use the register button at the upper right corner. After signing in, please use the tab 'New issue' and describe your feature or bug report. If you want to, you can follow your ticket to be notified about ongoing progress.
Join the Project¶
Since Salt is an open source project you are welcome to join and contribute. Please write an eMail to:
Projects using the Salt meta model or the converter framework Pepper within their software solutions:
- Zipser F., Zeldes A., Ritz J., Romary L. & Leser U. (2011)
Pepper: Handling a multiverse of formats
33. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft. Göttingen, 23.- 25. Februar 2011.
- Zipser F., Romary L. (2010)
A model oriented approach to the mapping of annotation formats using standards In: Proceedings of the Workshop on Language Resource and Language Technology Standards, LREC 2010.
- Zipser F. (2009)
Entwicklung eines Konverterframeworks für linguistisch annotierte Daten auf Basis eines gemeinsamen (Meta-)modells. Diplomarbeit, Humboldt-Universität zu Berlin, Institut für Informatik.
|Humboldt university of Berlin||SFB 632||INRIA|