Close

Help


This article provides some help to use the corpus site (corpus documentation). The corpus site was automatically generated by the SaltInfoModules a plugin for Pepper. In this help we demonstrate the use of the corpus site along an example using a corpus named pcc2. The layout of your site, the names of the corpora, documents and annotations will differ from our samples, but the usage stays the same.

The corpus site is divided into three sections: 1) the header section, 2) the navigation section and 3) the main section.

screenshot of the corpus site using the pcc2 corpus as sample

On top you see the main section. The main section contains the name of the corpus (here 'pcc2') and eventually a link to the search- and visualization system ANNIS. The link to ANNIS only occurs when, the corpus can be searched in an ANNIS instance.

On the left you see the navigation bar, which displays a tree structure containing the root corpus, sub corpora and even all documents belonging to the corpus. A corpus is marked with the icon on it's left and a document is marked with the icon. In our sample the root corpus is the 'pcc2' corpus which consists of the two documents '11299' and '4282'. By clicking on the small triangle next to a corpus or sub corpus name, you can collapse or expand the sub corpus structure. When clicking on a corpus or a document all information corresponding to this corpus or document are shown in the center of the site, the main section.

In the center you see a bunch of different tables. The first table shows the structural part of the currently selected corpus or document, that means the number of primary texts, tokens and other nodes or relations. The second shows the meta data, the currently selected corpus or document contains. An unbound number of tables showing the annotation names and values and the corresponding frequencies. In Salt linguistic annotations can be differentiated into layers. For instance a morphological layer can contain morphological annotations, a syntactic layer can contain syntactic annotations and so on. For each layer and also for all annotations which does not belong to a layer, one table is displayed. An example of such a table is given with the following screenshot:

screenshot of annotations of the corpus site using the pcc2 corpus as sample
The left column of the table lists all annotation names used in the currently selected corpus or document: "func", "lemma", "morph" and "pos". The number in bracktes like: "func(893)" is the number of annotations having the name "func". Right to the name, there are up to four icons: an 'i' icon, a double sided arrow, a box and a download icon. The right column of the table lists all annotation values and the corresponding frequencies. Since the set of annotation values can be very huge, the table displayes just the first five values. To show all annotation values, click on the double sided arrow icon in the left column of the same row.

Close