Formation processes

The following section describes the word-formation processes we can describe in our data. For the purpose of explanation, we select two ways of representing word-formation processes. A parentheses-based representation borrowed from the penn-treebank format (ptb) and a table style representation (CoNLL). Both ways of representing our annotations have different advantages and disadvantages, which will be discussed below in the technical section.

The morpheme forms (lemma for free forms, form representation for bound morphemes) are represented along with their lexical category:

ptb (in a single bracket with a space in betweeen): ({lexical_category} {morpheme_form})
CoNLL style (tab-separated): {morpheme_form} \t {lexical_category}

The word-formation process outcome is coded together with the process class using the following forms:

ptb: {category_of_outcome}:{process_class_label}
CoNLL style: {process_class_label}:{category_of_outcome}

The distinction between formats helps readability in the individual format.

Representing categories and process classes this way helps determine the stage of the word-formation processes further. Additionally, it helps create a uniform annotation scheme and facilitates consistency in the search queries later on. Scheme uniformity can lead to redundant representations of information, for instance regarding the formation of participles, comparatives and superlatives. Nonetheless, uniformity is the priority. It does not hold that any combination of categories and process labels is grammatically possible in German.

Note that simplex nouns are represented by just their category label and the respective morpheme (lemma for free forms, form representation for bound morphemes). Hence, they are recognizable by their lack of WF tag in the WF tree.

adopt

The label adopt is used for all WF processes within or between morphemes that are ambiguous or unknown. Ideally, this label does not exist after the manual correction of the automatic WF parsing.

conv

The label conv is used in the following case:

There is a change in the morphosyntactic category, e.g. from V to N.
The lemma occurs in its stem form, e.g. lemma "prallen", stem form "Prall".

conv:A

lemma = "schmuck"

:small_blue_diamond: (A:conv (N Schmuck))

graph TD;
  A{{A:conv}}---B(N Schmuck);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	schmuck	Schmuck	N	_	_	_	_	_	preop=conv:A

conv:N

lemma = "Zusammenprall"

:small_blue_diamond: (N:conv (V:cdet (VPART zusammen) (V prallen)))

graph TD;
  A{{N:conv}}---B{{V:cdet}};
  B---C(VPART <br> zusammen);
  B---D(V <br> prallen);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	zusammen	zusammen	VPART	_	_	2	cdet:V	_	_
2	prall	prallen	V	_	_	_	_	_	postop=conv:N

conv:V

lemma = "Fischen"

:small_blue_diamond: (N:conv (V:conv (N Fisch)))

graph TD;
  A{{N:trans}}---B{{V:conv}};
  B---C(N <br> Fisch);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	fischen	Fisch	N	_	_	_	_	_	preop=conv:V,trans:N

der

explicit derivation

The label der is used in cases when there is either affixation (explicit derivation) or a stem vowel change (implicit derivation).

der:A

lemma = "Gehässige"

:small_blue_diamond: (N:der (A:der (CIRCFX ge) (V hassen) (CIRCFX ig)))

graph TD;
  A{{N:der}}---B{{A:der}};
  B---C(CIRCFX <br> ge);
  B---D(V <br> hassen);
  B---E(CIRCFX <br> ig);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	ge	ge	CIRCFX	_	_	3	member	_	_
2	häss	hassen	V	_	_	3	der:A	_	_
3	ige	ig	CIRCFX	_	_	_	_	_	postop=der:N

lemma = "schmerzhaft"

:small_blue_diamond: (A:der (N Schmerz) (ASFX haft))

graph TD;
  A{{A:der}}---B(N <br> Schmerz);
  A---C(ASFX <br> haft);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	schmerz	Schmerz	N	_	_	2	der:A	_	_
2	haft	haft	ASFX	_	_	_	_	_	_

der:N

lemma = "Unterhaltung"

:small_blue_diamond: (N:der (V unterhalten) (NSFX ung))

graph TD;
  A{{N:der}}---B(V <br> unterhalten);
  A---C(NSFX <br> ung);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	unterhalt	unterhalten	V	_	_	2	der:N	_	_
2	ung	ung	NSFX	_	_	_	_	_	_

lemma = "Gebremse"

:small_blue_diamond: (der:N (CIRCFX Ge) (V bremsen) (CIRCFX e))

graph TD;
  A{{N:der}}---B(CIRCFX <br> Ge);
  A---C(V <br> bremsen);
  A---D(CIRCFX <br> e);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	Ge	Ge	CIRCFX	_	_	3	member	_	_
2	brems	bremsen	V	_	_	3	der:N	_	_
3	e	e	CIRCFX	_	_	_	_	_	_

der:V

lemma = "Vorgang"

:small_blue_diamond: (N:der (V:der (VPFX vor) (V gehen)))

graph TD;
  A{{N:der}}---B{{V:der}};
  B---C(VPFX <br> vor);
  B---D(V <br> gehen);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	vor	vor	VPFX	_	_	2	der:V	_	_
2	gang	gehen	V	_	_	_	_	_	postop=der:N

implicit derivation

The label der is used implicitly as part of a pre- or post-operation (in CoNLL-U) if the noun is derived from a verb by changing the stem vowel of stem form. See also section "participle: present & past".

lemma = "Kurzschlussreaktion"

:small_blue_diamond: (N:cdet (N:der (V:cdet (A kurz) (V schließen))) (N:der (V:der (VPFX re) (V agieren)) (NSFX ion)))

graph TD;
  A{{N:cdet}}---B{{N:der}};
  B---C{{V:cdet}};
  C---D(A <br> kurz);
  C---E(V <br> schließen);
  A---F{{N:der}};
  F---G{{V:der}};
  G---H(VPFX <br> re);
  G---I(V <br> agieren);
  F---J(NSFX <br> ion);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	kurz	kurz	A	_	_	2	cdet:V	_	_
2	schluss	schließen	V	_	_	5	cdet:N	_	postop=der:N
3	re	re	VPFX	_	_	4	der:V	_	_
4	akt	akt	V	_	_	5	der:N	_	_
5	ion	ion	NSFX	_	_	_	_	_	_

lemma = "Tränken"

:small_blue_diamond: (N:trans (V:der (V trinken))

graph TD;
  A{{N:trans}}---B{{V:der}};
  B---E(V <br> trinken);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	tränken	trinken	V	_	_	_	_	_	preop=der:V,trans:N

@flat

This label is used for compounds in case it is unclear which morpheme is the head and which is the dependent. This is particularly useful for compounds that are comprised of more than two morphemes. The right-most morpheme functions as the head of the compound since in German, compounds are usually right-headed constructions.

In the ptb-format, all ambiguous morphemes are simply placed within the same bracket as the phrasal head.

lemma = "Windschutzscheibe"

:small_blue_diamond: (N:cdet (N Wind) (N:der (V schützen)) (N Scheibe))

graph TD;
  A{{N:cdet}}---B(N <br> Wind);
  A---C{{N:der}}
  C---D(V <br> schützen);
  A---E(N <br> Scheibe);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	wind	Wind	N	_	_	3	cdet:N@flat	_	_
2	schutz	Schutz	N	_	_	3	cdet:N@flat	_	_
3	scheibe	Scheibe	N	_	_	_	_	_	_

Here, it is not entirely clear whether “Wind + Schutz” or “Schutz + Scheibe” first enter the compounding process. Hence, @flat is attached to cdet:N for both “Wind” and “Schutz”. The head is chosen to be “Scheibe”.

cdet

The relation label cdet is used for all kinds of determinative compounds, as well as particle verbs.

The label has the following specific use cases (see Falko Guidelines, p. 3):

the word is formed out of a head infinitive and verb phrase components in a non-head position :arrow_right: A & N subrelations are possible
the word form is a compound in which the head governs the non-head :arrow_right: all subrelations are possible
the word form is a compound where the non-head is a numeral :arrow_right :arrow_right: all subrelations are possible
the word form is deverbal and preceded by a preposition and there is no verb with the preposition as a prefix :arrow_right: A & N subrelations are possible

cdet:A

lemma = "Merkwürdiges"

:small_blue_diamond: (N:trans (A:cdet (V merken) (A:der (N Würde) (ASFX ig))))

graph TD;
  A{{N:der}}---B{{A:cdet}};
  B---C(V <br> merken);
  B---D{{A:der}};
  D---E(N <br> Würde);
  D---F(ASFX <br> ig);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	merk	merken	V	_	_	3	cdet:A	_	_
2	würd	Würde	N	_	_	3	der:A	_	_
3	iges	ig	ASFX	_	_	_	_	_	postop=trans:N

cdet:N

lemma = "Augenzeugenbericht"

:small_blue_diamond: (N:cdet (N:cdet (N Auge) (N Zeuge)) (N:conv (V berichten)))

graph TD;
  A{{N:cdet}}---B{{N:cdet}};
  A---C{{N:conv}};
  C---D(V <br> berichten);
  B---E(N <br> Auge);
  B---F(N <br> Zeuge);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5: (empty)	6: (empty)	7: head ID	8: type of WF process	9: (empty)	10: pre- & post-operations
1	auge	Auge	N	_	_	2	cdet:N	_	_
2	zeuge	Zeuge	N	_	_	3	cdet:N	_	_
3	bericht	berichten	V	_	_	_	_	_	preop=conv:N

cdet:V

The label cdet:V is used both for verbal compounds in the more general sense (see table 1 and 2) as well as particle verbs, which are more phrases than compounds due to the syntactic mobility (see table 3).

lemma = "Stillstand"

:small_blue_diamond: (N:der (V:cdet (A still) (V stehen)))

graph TD;
  A{{N:der}}---B{{V:cdet}};
  B---C(A <br> still);
  B---D(V <br> stehen);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	still	still	A	_	_	2	cdet:V	_	_
2	stand	stehen	V	_	_	_	_	_	postop=der:N

lemma = "Spazierengehen"

:small_blue_diamond: (N:trans (V:cdet (V spazieren) (V gehen)))

graph TD;
  A{{N:trans}}---B{{V:cdet}};
  B---C(V <br> spazieren);
  B---D(V <br> gehen);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	spazieren	spazieren	V	_	_	2	cdet:V	_	_
2	gehen	gehen	V	_	_	_	_	_	postop=trans:N

lemma = "Zusammenstoß"

:small_blue_diamond: (N:conv (V:cdet (VPART zusammen) (V stoßen)))

graph TD;
  A{{N:conv}}---B{{V:cdet}};
  B---C(VPART <br> zusammen);
  B---D(V <br> stoßen);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	zusammen	zusammen	VPART	_	_	2	cdet:V	_	_
2	stoß	stoßen	V	_	_	_	_	_	postop=conv:N

ccop

The label ccop is used for copular compounds, which consist of two semantically and hierarchically equal free morphemes.

lemma = "schwarzweiß"

:small_blue_diamond: (A:ccop (A schwarz) (A weiß))

graph TD;
  A{{A:ccop}}---B(A <br> schwarz);
  A---C(A <br> weiß);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	schwarz	schwarz	A	_	_	2	ccop:A	_	_
2	weiß	weiß	A	_	_	_	_	_	_

cphras

The label cphras is used for phrasal compounds (PCs), which have a lexical head (right-most constituent) and a phrasal non-head (left constituent).

The head of the left constituent receives the label cphras with the corresponding subrelation (A, N, or V).

The elements within the phrasal non-head are annotated in a flat manner with the label member. The head of the left constituent is the phrasal head, and not strictly the right-most element. This helps us distinguish which type of PC we are dealing with.

For instance, for PCs with a verb phrase as the left constituent, the verb heading the verb phrase is selected as the head of the left constituent (as seen in the example below, line 1). If the left constituent consists of a prepositional phrase, the preposition forms the head of the left constituent (e.g. "Zwischen-den-Mahlzeiten-Imbisse", Lawrenz 1996).
All other elements within the left constituents are attached to the phrasal head with the label member within the CoNLL-U format.

In the ptb-format, they are simply placed within the same bracket as the phrasal head (see section "member").

lemma = "komm-wie-du-bist-Hochzeit"

:small_blue_diamond: (N:cphras (V kommen) (PRON wie) (PRON du) (V sein) (N:cdet (A hoch) (N Zeit)))

graph TD
    A{{N:cphras}} --- B(V <br> kommen)
    A --- C(PRON <br> wie)
    A --- D(PRON <br> du)
    A --- E(V <br> sein)
    A --- F{{N:cdet}}
    F --- G(A <br> hoch)
    F --- H(N <br> Zeit)

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	komm	kommen	V	_	_	6	cphras:N	_	_
2	wie	wie	PRON	_	_	4	member	_	_
3	du	du	PRON	_	_	4	member	_	_
4	bist	sein	V	_	_	4	member	_	_
5	hoch	hoch	A	_	_	6	cdet:N	_	_
6	zeit	Zeit	N	_	_	_	_	_	_

member

In the CoNLL-U format, the label member is used to connect constituents that perform the same function within a word-formation process, e.g. circumfixes in derivation. They have two use cases:

derivation
- Here, the label follows the right-hand head principle. In other words, the right-most member of the circumfixation is the head, whereas the relation label member points towards the members left to it
phrasal compounds (see section "cphras" for reference)
- Here, the label is used for each constituent that is dependent of the phrasal head (the left constituent of the PC)

In the ptb-format, all member-elements are simply placed within the same bracket as their head.

lemma = "Gebremse"

:small_blue_diamond: (der:N (CIRCFX Ge) (V bremsen) (CIRCFX e))

graph TD;
  A{{N:der}}---B(CIRCFX <br> Ge);
  A---C(V <br> bremsen);
  A---D(CIRCFX <br> e);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	Ge	Ge	CIRCFX	_	_	3	member	_	_
2	brems	bremsen	V	_	_	3	der:N	_	_
3	e	e	CIRCFX	_	_	_	_	_	_

mov

The label mov refers to derivational processes differentiating the grammatical gender (cf. Movierung).

lemma = "Fahrerin"

:small_blue_diamond: (N:mov (N:der (V fahren))

graph TD;
  A{{N:mov}}---B{{N:der}};
  B---C(V <br> fahren);
  B---D(NSFX <br> er);
  A---E(NSFX <br> in);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	fahr	fahren	V	_	_	2	der:N	_	_
2	er	er	NSFX	_	_	3	mov:N	_	_
3	in	in	NSFX	_	_	_	_	_	_

trans

The label trans is used in case there is a morphosyntactic reclassification without a semantic reclassification. A further criterion here is that the morpheme form is not changed, i.e. it is not used for conversions into the stem form (see section "conv") and for affixations or stem vowel changes (see section "der").

In opposition to the Falko guidelines, trans is also ([-currently-]) used if the noun resembles an infinitive but contrary to actual transpositions it has masculine grammatical gender.

Note: this can be systematically checked by searching for all masculine nouns part of the case annotations of P5/P11
- search query: canon:Gender=‎"Masc‎" _o_ pos_lang=‎/N.‎/ _o_ lemma

trans:A

lemma = "feind"

:small_blue_diamond: (A:trans (N Feind))

graph TD;
  A{{A:trans}}---B(N <br> Feind);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	feind	Feind	N	_	_	_	_	_	preop=trans:A

trans:N

lemma = "Überqueren"

:small_blue_diamond: (N:trans (V:der (VPFX über) (V queren)))

graph TD;
  A{{N:trans}}---B{{V:der}};
  B---C(VPFX <br> über);
  B---D(V <br> queren);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	über	über	VPFX	_	_	2	der:V	_	_
2	queren	queren	V	_	_	_	_	_	postop=trans:N

:small_blue_diamond: (N:trans (A:sup (A lieb)))

graph TD;
  A{{N:trans}}---B{{A:sup}};
  B---C(A <br> lieb);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	liebste	lieb	A	_	_	_	_	_	preop=sup:A,trans:N

participle: present & past

Here, we use the labels PPres for the present participle and PPast for the past participle.

lemma = "Fahrende"

:small_blue_diamond: (N:trans (A:trans (V:PPres (V fahren)))

graph TD;
  A{{N:trans}}---B{{A:trans}};
  B---C{{V:PPres}};
  C---D(V <br> fahren);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	fahren	fahren	V	_	_	_	_	_	preop=PPres:V,trans:A,trans:N

lemma = "Unfallverursachende"

:small_blue_diamond: (N:trans (A:cdet (N Unfall) (A:trans (V:PPres (V verursachen))))

graph TD;
  A{{N:trans}}---B{{A:cdet}};
  B---E(N <br> Unfall);
  B---C{{A:trans}};
  C---D{{V:PPres}};
  D---F(V <br> verursachen);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	unfall	Unfall	N	_	_	2	cdet:A	_	_
2	verursachende	verursachen	V	_	_	_	_	_	preop=PPres:V,trans:A\|postop=trans:N

lemma = "Reingefahrene"

:small_blue_diamond: (N:trans (A:trans (V:PPast (V:cdet (VPART rein) (V fahren)))))

graph TD;
  A{{N:trans}}---B{{A:trans}};
  B---C{{V:PPast}};
  C---D{{V:cdet}};
  D---E(VPART <br> rein);
  D---F(V <br> fahren);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	rein	rein	VPART	_	_	2	cdet:V	_	_
2	gefahrene	fahren	V	_	_	_	_	_	postop=PPast:V,trans:A,trans:N

Preliminary Guidelines on Noun Word Formation in the RUEG Corpus (German)