Formation processes
The following section describes the word-formation processes we can describe in our data. For the purpose of explanation, we select two ways of representing word-formation processes. A parentheses-based representation borrowed from the penn-treebank format (ptb) and a table style representation (CoNLL). Both ways of representing our annotations have different advantages and disadvantages, which will be discussed below in the technical section.
The morpheme forms (lemma for free forms, form representation for bound morphemes) are represented along with their lexical category:
- ptb (in a single bracket with a space in betweeen):
({lexical_category} {morpheme_form})
- CoNLL style (tab-separated):
{morpheme_form} \t {lexical_category}
The word-formation process outcome is coded together with the process class using the following forms:
- ptb:
{category_of_outcome}:{process_class_label}
- CoNLL style:
{process_class_label}:{category_of_outcome}
The distinction between formats helps readability in the individual format.
Representing categories and process classes this way helps determine the stage of the word-formation processes further. Additionally, it helps create a uniform annotation scheme and facilitates consistency in the search queries later on. Scheme uniformity can lead to redundant representations of information, for instance regarding the formation of participles, comparatives and superlatives. Nonetheless, uniformity is the priority. It does not hold that any combination of categories and process labels is grammatically possible in German.
Note that simplex nouns are represented by just their category label and the respective morpheme (lemma for free forms, form representation for bound morphemes). Hence, they are recognizable by their lack of WF tag in the WF tree.
adopt
The label adopt
is used for all WF processes within or between morphemes that are ambiguous or unknown. Ideally, this label does not exist after the manual correction of the automatic WF parsing.
conv
The label conv
is used in the following case:
- There is a change in the morphosyntactic category, e.g. from V to N.
- The lemma occurs in its stem form, e.g. lemma "prallen", stem form "Prall".
conv:A
lemma = "schmuck"
:small_blue_diamond: (A:conv (N Schmuck))
graph TD;
A{{A:conv}}---B(N Schmuck);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | schmuck | Schmuck | N | _ | _ | _ | _ | _ | preop=conv:A |
conv:N
lemma = "Zusammenprall"
:small_blue_diamond: (N:conv (V:cdet (VPART zusammen) (V prallen)))
graph TD;
A{{N:conv}}---B{{V:cdet}};
B---C(VPART <br> zusammen);
B---D(V <br> prallen);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | zusammen | zusammen | VPART | _ | _ | 2 | cdet:V | _ | _ |
2 | prall | prallen | V | _ | _ | _ | _ | _ | postop=conv:N |
conv:V
lemma = "Fischen"
:small_blue_diamond: (N:conv (V:conv (N Fisch)))
graph TD;
A{{N:trans}}---B{{V:conv}};
B---C(N <br> Fisch);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | fischen | Fisch | N | _ | _ | _ | _ | _ | preop=conv:V,trans:N |
der
explicit derivation
The label der
is used in cases when there is either affixation (explicit derivation) or a stem vowel change (implicit derivation).
der:A
lemma = "Gehässige"
:small_blue_diamond: (N:der (A:der (CIRCFX ge) (V hassen) (CIRCFX ig)))
graph TD;
A{{N:der}}---B{{A:der}};
B---C(CIRCFX <br> ge);
B---D(V <br> hassen);
B---E(CIRCFX <br> ig);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | ge | ge | CIRCFX | _ | _ | 3 | member | _ | _ |
2 | häss | hassen | V | _ | _ | 3 | der:A | _ | _ |
3 | ige | ig | CIRCFX | _ | _ | _ | _ | _ | postop=der:N |
lemma = "schmerzhaft"
:small_blue_diamond: (A:der (N Schmerz) (ASFX haft))
graph TD;
A{{A:der}}---B(N <br> Schmerz);
A---C(ASFX <br> haft);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | schmerz | Schmerz | N | _ | _ | 2 | der:A | _ | _ |
2 | haft | haft | ASFX | _ | _ | _ | _ | _ | _ |
der:N
lemma = "Unterhaltung"
:small_blue_diamond: (N:der (V unterhalten) (NSFX ung))
graph TD;
A{{N:der}}---B(V <br> unterhalten);
A---C(NSFX <br> ung);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | unterhalt | unterhalten | V | _ | _ | 2 | der:N | _ | _ |
2 | ung | ung | NSFX | _ | _ | _ | _ | _ | _ |
lemma = "Gebremse"
:small_blue_diamond: (der:N (CIRCFX Ge) (V bremsen) (CIRCFX e))
graph TD;
A{{N:der}}---B(CIRCFX <br> Ge);
A---C(V <br> bremsen);
A---D(CIRCFX <br> e);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | Ge | Ge | CIRCFX | _ | _ | 3 | member | _ | _ |
2 | brems | bremsen | V | _ | _ | 3 | der:N | _ | _ |
3 | e | e | CIRCFX | _ | _ | _ | _ | _ | _ |
der:V
lemma = "Vorgang"
:small_blue_diamond: (N:der (V:der (VPFX vor) (V gehen)))
graph TD;
A{{N:der}}---B{{V:der}};
B---C(VPFX <br> vor);
B---D(V <br> gehen);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | vor | vor | VPFX | _ | _ | 2 | der:V | _ | _ |
2 | gang | gehen | V | _ | _ | _ | _ | _ | postop=der:N |
implicit derivation
The label der
is used implicitly as part of a pre- or post-operation (in CoNLL-U) if the noun is derived from a verb by changing the stem vowel of stem form.
See also section "participle: present & past".
lemma = "Kurzschlussreaktion"
:small_blue_diamond: (N:cdet (N:der (V:cdet (A kurz) (V schließen))) (N:der (V:der (VPFX re) (V agieren)) (NSFX ion)))
graph TD;
A{{N:cdet}}---B{{N:der}};
B---C{{V:cdet}};
C---D(A <br> kurz);
C---E(V <br> schließen);
A---F{{N:der}};
F---G{{V:der}};
G---H(VPFX <br> re);
G---I(V <br> agieren);
F---J(NSFX <br> ion);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | kurz | kurz | A | _ | _ | 2 | cdet:V | _ | _ |
2 | schluss | schließen | V | _ | _ | 5 | cdet:N | _ | postop=der:N |
3 | re | re | VPFX | _ | _ | 4 | der:V | _ | _ |
4 | akt | akt | V | _ | _ | 5 | der:N | _ | _ |
5 | ion | ion | NSFX | _ | _ | _ | _ | _ | _ |
lemma = "Tränken"
:small_blue_diamond: (N:trans (V:der (V trinken))
graph TD;
A{{N:trans}}---B{{V:der}};
B---E(V <br> trinken);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | tränken | trinken | V | _ | _ | _ | _ | _ | preop=der:V,trans:N |
@flat
This label is used for compounds in case it is unclear which morpheme is the head and which is the dependent. This is particularly useful for compounds that are comprised of more than two morphemes. The right-most morpheme functions as the head of the compound since in German, compounds are usually right-headed constructions.
In the ptb-format, all ambiguous morphemes are simply placed within the same bracket as the phrasal head.
lemma = "Windschutzscheibe"
:small_blue_diamond: (N:cdet (N Wind) (N:der (V schützen)) (N Scheibe))
graph TD;
A{{N:cdet}}---B(N <br> Wind);
A---C{{N:der}}
C---D(V <br> schützen);
A---E(N <br> Scheibe);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | wind | Wind | N | _ | _ | 3 | cdet:N@flat | _ | _ |
2 | schutz | Schutz | N | _ | _ | 3 | cdet:N@flat | _ | _ |
3 | scheibe | Scheibe | N | _ | _ | _ | _ | _ | _ |
Here, it is not entirely clear whether “Wind + Schutz” or “Schutz + Scheibe” first enter the compounding process. Hence, @flat
is attached to cdet:N
for both “Wind” and “Schutz”. The head is chosen to be “Scheibe”.
cdet
The relation label cdet
is used for all kinds of determinative compounds, as well as particle verbs.
The label has the following specific use cases (see Falko Guidelines, p. 3):
- the word is formed out of a head infinitive and verb phrase components in a non-head position :arrow_right: A & N subrelations are possible
- the word form is a compound in which the head governs the non-head :arrow_right: all subrelations are possible
- the word form is a compound where the non-head is a numeral :arrow_right :arrow_right: all subrelations are possible
- the word form is deverbal and preceded by a preposition and there is no verb with the preposition as a prefix :arrow_right: A & N subrelations are possible
cdet:A
lemma = "Merkwürdiges"
:small_blue_diamond: (N:trans (A:cdet (V merken) (A:der (N Würde) (ASFX ig))))
graph TD;
A{{N:der}}---B{{A:cdet}};
B---C(V <br> merken);
B---D{{A:der}};
D---E(N <br> Würde);
D---F(ASFX <br> ig);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | merk | merken | V | _ | _ | 3 | cdet:A | _ | _ |
2 | würd | Würde | N | _ | _ | 3 | der:A | _ | _ |
3 | iges | ig | ASFX | _ | _ | _ | _ | _ | postop=trans:N |
cdet:N
lemma = "Augenzeugenbericht"
:small_blue_diamond: (N:cdet (N:cdet (N Auge) (N Zeuge)) (N:conv (V berichten)))
graph TD;
A{{N:cdet}}---B{{N:cdet}};
A---C{{N:conv}};
C---D(V <br> berichten);
B---E(N <br> Auge);
B---F(N <br> Zeuge);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5: (empty) | 6: (empty) | 7: head ID | 8: type of WF process | 9: (empty) | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | auge | Auge | N | _ | _ | 2 | cdet:N | _ | _ |
2 | zeuge | Zeuge | N | _ | _ | 3 | cdet:N | _ | _ |
3 | bericht | berichten | V | _ | _ | _ | _ | _ | preop=conv:N |
cdet:V
The label cdet:V is used both for verbal compounds in the more general sense (see table 1 and 2) as well as particle verbs, which are more phrases than compounds due to the syntactic mobility (see table 3).
lemma = "Stillstand"
:small_blue_diamond: (N:der (V:cdet (A still) (V stehen)))
graph TD;
A{{N:der}}---B{{V:cdet}};
B---C(A <br> still);
B---D(V <br> stehen);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | still | still | A | _ | _ | 2 | cdet:V | _ | _ |
2 | stand | stehen | V | _ | _ | _ | _ | _ | postop=der:N |
lemma = "Spazierengehen"
:small_blue_diamond: (N:trans (V:cdet (V spazieren) (V gehen)))
graph TD;
A{{N:trans}}---B{{V:cdet}};
B---C(V <br> spazieren);
B---D(V <br> gehen);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | spazieren | spazieren | V | _ | _ | 2 | cdet:V | _ | _ |
2 | gehen | gehen | V | _ | _ | _ | _ | _ | postop=trans:N |
lemma = "Zusammenstoß"
:small_blue_diamond: (N:conv (V:cdet (VPART zusammen) (V stoßen)))
graph TD;
A{{N:conv}}---B{{V:cdet}};
B---C(VPART <br> zusammen);
B---D(V <br> stoßen);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | zusammen | zusammen | VPART | _ | _ | 2 | cdet:V | _ | _ |
2 | stoß | stoßen | V | _ | _ | _ | _ | _ | postop=conv:N |
ccop
The label ccop
is used for copular compounds, which consist of two semantically and hierarchically equal free morphemes.
lemma = "schwarzweiß"
:small_blue_diamond: (A:ccop (A schwarz) (A weiß))
graph TD;
A{{A:ccop}}---B(A <br> schwarz);
A---C(A <br> weiß);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | schwarz | schwarz | A | _ | _ | 2 | ccop:A | _ | _ |
2 | weiß | weiß | A | _ | _ | _ | _ | _ | _ |
cphras
The label cphras
is used for phrasal compounds (PCs), which have a lexical head (right-most constituent) and a phrasal non-head (left constituent).
The head of the left constituent receives the label cphras
with the corresponding subrelation (A, N, or V).
The elements within the phrasal non-head are annotated in a flat manner with the label member
. The head of the left constituent is the phrasal head, and not strictly the right-most element. This helps us distinguish which type of PC we are dealing with.
- For instance, for PCs with a verb phrase as the left constituent, the verb heading the verb phrase is selected as the head of the left constituent (as seen in the example below, line 1). If the left constituent consists of a prepositional phrase, the preposition forms the head of the left constituent (e.g. "Zwischen-den-Mahlzeiten-Imbisse", Lawrenz 1996).
- All other elements within the left constituents are attached to the phrasal head with the label
member
within the CoNLL-U format.
In the ptb-format, they are simply placed within the same bracket as the phrasal head (see section "member").
lemma = "komm-wie-du-bist-Hochzeit"
:small_blue_diamond: (N:cphras (V kommen) (PRON wie) (PRON du) (V sein) (N:cdet (A hoch) (N Zeit)))
graph TD
A{{N:cphras}} --- B(V <br> kommen)
A --- C(PRON <br> wie)
A --- D(PRON <br> du)
A --- E(V <br> sein)
A --- F{{N:cdet}}
F --- G(A <br> hoch)
F --- H(N <br> Zeit)
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | komm | kommen | V | _ | _ | 6 | cphras:N | _ | _ |
2 | wie | wie | PRON | _ | _ | 4 | member | _ | _ |
3 | du | du | PRON | _ | _ | 4 | member | _ | _ |
4 | bist | sein | V | _ | _ | 4 | member | _ | _ |
5 | hoch | hoch | A | _ | _ | 6 | cdet:N | _ | _ |
6 | zeit | Zeit | N | _ | _ | _ | _ | _ | _ |
member
In the CoNLL-U format, the label member
is used to connect constituents that perform the same function within a word-formation process, e.g. circumfixes in derivation. They have two use cases:
- derivation
- Here, the label follows the right-hand head principle. In other words, the right-most member of the circumfixation is the head, whereas the relation label
member
points towards the members left to it
- Here, the label follows the right-hand head principle. In other words, the right-most member of the circumfixation is the head, whereas the relation label
- phrasal compounds (see section "cphras" for reference)
- Here, the label is used for each constituent that is dependent of the phrasal head (the left constituent of the PC)
In the ptb-format, all member-elements are simply placed within the same bracket as their head.
lemma = "Gebremse"
:small_blue_diamond: (der:N (CIRCFX Ge) (V bremsen) (CIRCFX e))
graph TD;
A{{N:der}}---B(CIRCFX <br> Ge);
A---C(V <br> bremsen);
A---D(CIRCFX <br> e);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | Ge | Ge | CIRCFX | _ | _ | 3 | member | _ | _ |
2 | brems | bremsen | V | _ | _ | 3 | der:N | _ | _ |
3 | e | e | CIRCFX | _ | _ | _ | _ | _ | _ |
mov
The label mov
refers to derivational processes differentiating the grammatical gender (cf. Movierung).
lemma = "Fahrerin"
:small_blue_diamond: (N:mov (N:der (V fahren))
graph TD;
A{{N:mov}}---B{{N:der}};
B---C(V <br> fahren);
B---D(NSFX <br> er);
A---E(NSFX <br> in);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | fahr | fahren | V | _ | _ | 2 | der:N | _ | _ |
2 | er | er | NSFX | _ | _ | 3 | mov:N | _ | _ |
3 | in | in | NSFX | _ | _ | _ | _ | _ | _ |
trans
The label trans
is used in case there is a morphosyntactic reclassification without a semantic reclassification. A further criterion here is that the morpheme form is not changed, i.e. it is not used for conversions into the stem form (see section "conv") and for affixations or stem vowel changes (see section "der").
In opposition to the Falko guidelines, trans
is also ([-currently-]) used if the noun resembles an infinitive but contrary to actual transpositions it has masculine grammatical gender.
- Note: this can be systematically checked by searching for all masculine nouns part of the case annotations of P5/P11
- search query: canon:Gender="Masc" _o_ pos_lang=/N./ _o_ lemma
trans:A
lemma = "feind"
:small_blue_diamond: (A:trans (N Feind))
graph TD;
A{{A:trans}}---B(N <br> Feind);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | feind | Feind | N | _ | _ | _ | _ | _ | preop=trans:A |
trans:N
lemma = "Überqueren"
:small_blue_diamond: (N:trans (V:der (VPFX über) (V queren)))
graph TD;
A{{N:trans}}---B{{V:der}};
B---C(VPFX <br> über);
B---D(V <br> queren);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | über | über | VPFX | _ | _ | 2 | der:V | _ | _ |
2 | queren | queren | V | _ | _ | _ | _ | _ | postop=trans:N |
trans:V
(to do or to delete)
between inflection and derivation
We treat forms where it is unclear if we are dealing with an inflectional or derivational process as processes of word-formation separate from derivation.
adjectives: comparative & superlative
Here, we use the labels comp
for the comparative and sup
for the superlative form.
lemma = "Liebste"
:small_blue_diamond: (N:trans (A:sup (A lieb)))
graph TD;
A{{N:trans}}---B{{A:sup}};
B---C(A <br> lieb);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | liebste | lieb | A | _ | _ | _ | _ | _ | preop=sup:A,trans:N |
participle: present & past
Here, we use the labels PPres
for the present participle and PPast
for the past participle.
lemma = "Fahrende"
:small_blue_diamond: (N:trans (A:trans (V:PPres (V fahren)))
graph TD;
A{{N:trans}}---B{{A:trans}};
B---C{{V:PPres}};
C---D(V <br> fahren);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | fahren | fahren | V | _ | _ | _ | _ | _ | preop=PPres:V,trans:A,trans:N |
lemma = "Unfallverursachende"
:small_blue_diamond: (N:trans (A:cdet (N Unfall) (A:trans (V:PPres (V verursachen))))
graph TD;
A{{N:trans}}---B{{A:cdet}};
B---E(N <br> Unfall);
B---C{{A:trans}};
C---D{{V:PPres}};
D---F(V <br> verursachen);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | unfall | Unfall | N | _ | _ | 2 | cdet:A | _ | _ |
2 | verursachende | verursachen | V | _ | _ | _ | _ | _ | preop=PPres:V,trans:A|postop=trans:N |
lemma = "Reingefahrene"
:small_blue_diamond: (N:trans (A:trans (V:PPast (V:cdet (VPART rein) (V fahren)))))
graph TD;
A{{N:trans}}---B{{A:trans}};
B---C{{V:PPast}};
C---D{{V:cdet}};
D---E(VPART <br> rein);
D---F(V <br> fahren);
1: ID | 2: allomorph form | 3: morpheme lemma | 4: morpheme category | 5:empty | 6: empty | 7: head ID | 8: type of WF process | 9: empty | 10: pre- & post-operations |
---|---|---|---|---|---|---|---|---|---|
1 | rein | rein | VPART | _ | _ | 2 | cdet:V | _ | _ |
2 | gefahrene | fahren | V | _ | _ | _ | _ | _ | postop=PPast:V,trans:A,trans:N |