Nominal Word Formation in the RUEG Corpus (German) -- Preliminary Annotation Guidelines

Please note that these guidelines are still updated on a regular basis and represent work in progress, thus they might be incomplete. A final version will be presented later and will be linked in this section.

The guidelines build on previous work of Lüdeling, Lukassek & Akbari (to appear) [LukassekAkbariLuedeling2023].

Authors

Nadine Zürn (Universität Mannheim)
Martin Klotz -- martin.klotz (at) hu-berlin (dot) de (Humboldt-Universität zu Berlin)
Stella Baumann (Universität Mannheim)

Annotation Purpose

The purpose of this annotation is to gain insight into the composition of morphologically complex nouns. The annotations identify transparent⁽¹⁾ word-formation (WF) processes and their constituents. Therefore, they contain four levels of information:

each individual morpheme (lemma for free forms, form representation for bound morphemes) within the complex noun
each word-formation process of the intermediate complex constituent(s) up to the complex lexeme
the category of each word-formation process
the order and hierarchy of all involved word-formation processes

⁽¹⁾Even when taking a synchronous perspective on word formation, one must consider the diachronic perspective at times, breaking down the etymology of a word. The difficulty then is to not carry the analysis too far down the historical road. Considering only synchronically transparent formation processes and morphemes helps with this issue. Borderline cases are discussed and documented (see e.g. lemma = "Fischen").

Basic annotation principles

There has to be at least one input morpheme or already complex form. Each input constituent has a lexical category. There is a word-formation process applied to the input constituent(s). This process has a label and a single output of a certain category as well. The output of a word-formation process can then again be an input to a subsequent formation process. Available lexical categories of process inputs and outputs as well as a classification of formation processes are described below.

Here are three examples:

(1) Autounfall is of category noun and the outcome of the process determinative compounding, which is applied to the two inputs auto (noun) and unfall (noun)

(2) (das) Fahren is a noun originates in the verbal root fahren via transposition (recategorization of a syntactical form)

(3) Spaziergang (noun) is a compound of spazieren (verb) and gang (noun). The latter morpheme results from the implicit derivation of gehen (verb)

Formation processes

The following section describes the word-formation processes we can describe in our data. For the purpose of explanation, we select two ways of representing word-formation processes. A parentheses-based representation borrowed from the penn-treebank format (ptb) and a table style representation (CoNLL). Both ways of representing our annotations have different advantages and disadvantages, which will be discussed below in the technical section.

The morpheme forms (lemma for free forms, form representation for bound morphemes) are represented along with their lexical category:

ptb (in a single bracket with a space in betweeen): ({lexical_category} {morpheme_form})
CoNLL style (tab-separated): {morpheme_form} \t {lexical_category}

The word-formation process outcome is coded together with the process class using the following forms:

ptb: {category_of_outcome}:{process_class_label}
CoNLL style: {process_class_label}:{category_of_outcome}

The distinction between formats helps readability in the individual format.

Representing categories and process classes this way helps determine the stage of the word-formation processes further. Additionally, it helps create a uniform annotation scheme and facilitates consistency in the search queries later on. Scheme uniformity can lead to redundant representations of information, for instance regarding the formation of participles, comparatives and superlatives. Nonetheless, uniformity is the priority. It does not hold that any combination of categories and process labels is grammatically possible in German.

Note that simplex nouns are represented by just their category label and the respective morpheme (lemma for free forms, form representation for bound morphemes). Hence, they are recognizable by their lack of WF tag in the WF tree.

adopt

The label adopt is used for all WF processes within or between morphemes that are ambiguous or unknown. Ideally, this label does not exist after the manual correction of the automatic WF parsing.

conv

The label conv is used in the following case:

There is a change in the morphosyntactic category, e.g. from V to N.
The lemma occurs in its stem form, e.g. lemma "prallen", stem form "Prall".

conv:A

lemma = "schmuck"

:small_blue_diamond: (A:conv (N Schmuck))

graph TD;
  A{{A:conv}}---B(N Schmuck);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	schmuck	Schmuck	N	_	_	_	_	_	preop=conv:A

conv:N

lemma = "Zusammenprall"

:small_blue_diamond: (N:conv (V:cdet (VPART zusammen) (V prallen)))

graph TD;
  A{{N:conv}}---B{{V:cdet}};
  B---C(VPART <br> zusammen);
  B---D(V <br> prallen);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	zusammen	zusammen	VPART	_	_	2	cdet:V	_	_
2	prall	prallen	V	_	_	_	_	_	postop=conv:N

conv:V

lemma = "Fischen"

:small_blue_diamond: (N:conv (V:conv (N Fisch)))

graph TD;
  A{{N:trans}}---B{{V:conv}};
  B---C(N <br> Fisch);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	fischen	Fisch	N	_	_	_	_	_	preop=conv:V,trans:N

der

explicit derivation

The label der is used in cases when there is either affixation (explicit derivation) or a stem vowel change (implicit derivation).

der:A

lemma = "Gehässige"

:small_blue_diamond: (N:der (A:der (CIRCFX ge) (V hassen) (CIRCFX ig)))

graph TD;
  A{{N:der}}---B{{A:der}};
  B---C(CIRCFX <br> ge);
  B---D(V <br> hassen);
  B---E(CIRCFX <br> ig);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	ge	ge	CIRCFX	_	_	3	member	_	_
2	häss	hassen	V	_	_	3	der:A	_	_
3	ige	ig	CIRCFX	_	_	_	_	_	postop=der:N

lemma = "schmerzhaft"

:small_blue_diamond: (A:der (N Schmerz) (ASFX haft))

graph TD;
  A{{A:der}}---B(N <br> Schmerz);
  A---C(ASFX <br> haft);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	schmerz	Schmerz	N	_	_	2	der:A	_	_
2	haft	haft	ASFX	_	_	_	_	_	_

der:N

lemma = "Unterhaltung"

:small_blue_diamond: (N:der (V unterhalten) (NSFX ung))

graph TD;
  A{{N:der}}---B(V <br> unterhalten);
  A---C(NSFX <br> ung);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	unterhalt	unterhalten	V	_	_	2	der:N	_	_
2	ung	ung	NSFX	_	_	_	_	_	_

lemma = "Gebremse"

:small_blue_diamond: (der:N (CIRCFX Ge) (V bremsen) (CIRCFX e))

graph TD;
  A{{N:der}}---B(CIRCFX <br> Ge);
  A---C(V <br> bremsen);
  A---D(CIRCFX <br> e);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	Ge	Ge	CIRCFX	_	_	3	member	_	_
2	brems	bremsen	V	_	_	3	der:N	_	_
3	e	e	CIRCFX	_	_	_	_	_	_

der:V

lemma = "Vorgang"

:small_blue_diamond: (N:der (V:der (VPFX vor) (V gehen)))

graph TD;
  A{{N:der}}---B{{V:der}};
  B---C(VPFX <br> vor);
  B---D(V <br> gehen);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	vor	vor	VPFX	_	_	2	der:V	_	_
2	gang	gehen	V	_	_	_	_	_	postop=der:N

implicit derivation

The label der is used implicitly as part of a pre- or post-operation (in CoNLL-U) if the noun is derived from a verb by changing the stem vowel of stem form. See also section "participle: present & past".

lemma = "Kurzschlussreaktion"

:small_blue_diamond: (N:cdet (N:der (V:cdet (A kurz) (V schließen))) (N:der (V:der (VPFX re) (V agieren)) (NSFX ion)))

graph TD;
  A{{N:cdet}}---B{{N:der}};
  B---C{{V:cdet}};
  C---D(A <br> kurz);
  C---E(V <br> schließen);
  A---F{{N:der}};
  F---G{{V:der}};
  G---H(VPFX <br> re);
  G---I(V <br> agieren);
  F---J(NSFX <br> ion);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	kurz	kurz	A	_	_	2	cdet:V	_	_
2	schluss	schließen	V	_	_	5	cdet:N	_	postop=der:N
3	re	re	VPFX	_	_	4	der:V	_	_
4	akt	akt	V	_	_	5	der:N	_	_
5	ion	ion	NSFX	_	_	_	_	_	_

lemma = "Tränken"

:small_blue_diamond: (N:trans (V:der (V trinken))

graph TD;
  A{{N:trans}}---B{{V:der}};
  B---E(V <br> trinken);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	tränken	trinken	V	_	_	_	_	_	preop=der:V,trans:N

@flat

This label is used for compounds in case it is unclear which morpheme is the head and which is the dependent. This is particularly useful for compounds that are comprised of more than two morphemes. The right-most morpheme functions as the head of the compound since in German, compounds are usually right-headed constructions.

In the ptb-format, all ambiguous morphemes are simply placed within the same bracket as the phrasal head.

lemma = "Windschutzscheibe"

:small_blue_diamond: (N:cdet (N Wind) (N:der (V schützen)) (N Scheibe))

graph TD;
  A{{N:cdet}}---B(N <br> Wind);
  A---C{{N:der}}
  C---D(V <br> schützen);
  A---E(N <br> Scheibe);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	wind	Wind	N	_	_	3	cdet:N@flat	_	_
2	schutz	Schutz	N	_	_	3	cdet:N@flat	_	_
3	scheibe	Scheibe	N	_	_	_	_	_	_

Here, it is not entirely clear whether “Wind + Schutz” or “Schutz + Scheibe” first enter the compounding process. Hence, @flat is attached to cdet:N for both “Wind” and “Schutz”. The head is chosen to be “Scheibe”.

cdet

The relation label cdet is used for all kinds of determinative compounds, as well as particle verbs.

The label has the following specific use cases (see Falko Guidelines, p. 3):

the word is formed out of a head infinitive and verb phrase components in a non-head position :arrow_right: A & N subrelations are possible
the word form is a compound in which the head governs the non-head :arrow_right: all subrelations are possible
the word form is a compound where the non-head is a numeral :arrow_right :arrow_right: all subrelations are possible
the word form is deverbal and preceded by a preposition and there is no verb with the preposition as a prefix :arrow_right: A & N subrelations are possible

cdet:A

lemma = "Merkwürdiges"

:small_blue_diamond: (N:trans (A:cdet (V merken) (A:der (N Würde) (ASFX ig))))

graph TD;
  A{{N:der}}---B{{A:cdet}};
  B---C(V <br> merken);
  B---D{{A:der}};
  D---E(N <br> Würde);
  D---F(ASFX <br> ig);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	merk	merken	V	_	_	3	cdet:A	_	_
2	würd	Würde	N	_	_	3	der:A	_	_
3	iges	ig	ASFX	_	_	_	_	_	postop=trans:N

cdet:N

lemma = "Augenzeugenbericht"

:small_blue_diamond: (N:cdet (N:cdet (N Auge) (N Zeuge)) (N:conv (V berichten)))

graph TD;
  A{{N:cdet}}---B{{N:cdet}};
  A---C{{N:conv}};
  C---D(V <br> berichten);
  B---E(N <br> Auge);
  B---F(N <br> Zeuge);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5: (empty)	6: (empty)	7: head ID	8: type of WF process	9: (empty)	10: pre- & post-operations
1	auge	Auge	N	_	_	2	cdet:N	_	_
2	zeuge	Zeuge	N	_	_	3	cdet:N	_	_
3	bericht	berichten	V	_	_	_	_	_	preop=conv:N

cdet:V

The label cdet:V is used both for verbal compounds in the more general sense (see table 1 and 2) as well as particle verbs, which are more phrases than compounds due to the syntactic mobility (see table 3).

lemma = "Stillstand"

:small_blue_diamond: (N:der (V:cdet (A still) (V stehen)))

graph TD;
  A{{N:der}}---B{{V:cdet}};
  B---C(A <br> still);
  B---D(V <br> stehen);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	still	still	A	_	_	2	cdet:V	_	_
2	stand	stehen	V	_	_	_	_	_	postop=der:N

lemma = "Spazierengehen"

:small_blue_diamond: (N:trans (V:cdet (V spazieren) (V gehen)))

graph TD;
  A{{N:trans}}---B{{V:cdet}};
  B---C(V <br> spazieren);
  B---D(V <br> gehen);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	spazieren	spazieren	V	_	_	2	cdet:V	_	_
2	gehen	gehen	V	_	_	_	_	_	postop=trans:N

lemma = "Zusammenstoß"

:small_blue_diamond: (N:conv (V:cdet (VPART zusammen) (V stoßen)))

graph TD;
  A{{N:conv}}---B{{V:cdet}};
  B---C(VPART <br> zusammen);
  B---D(V <br> stoßen);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	zusammen	zusammen	VPART	_	_	2	cdet:V	_	_
2	stoß	stoßen	V	_	_	_	_	_	postop=conv:N

ccop

The label ccop is used for copular compounds, which consist of two semantically and hierarchically equal free morphemes.

lemma = "schwarzweiß"

:small_blue_diamond: (A:ccop (A schwarz) (A weiß))

graph TD;
  A{{A:ccop}}---B(A <br> schwarz);
  A---C(A <br> weiß);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	schwarz	schwarz	A	_	_	2	ccop:A	_	_
2	weiß	weiß	A	_	_	_	_	_	_

cphras

The label cphras is used for phrasal compounds (PCs), which have a lexical head (right-most constituent) and a phrasal non-head (left constituent).

The head of the left constituent receives the label cphras with the corresponding subrelation (A, N, or V).

The elements within the phrasal non-head are annotated in a flat manner with the label member. The head of the left constituent is the phrasal head, and not strictly the right-most element. This helps us distinguish which type of PC we are dealing with.

For instance, for PCs with a verb phrase as the left constituent, the verb heading the verb phrase is selected as the head of the left constituent (as seen in the example below, line 1). If the left constituent consists of a prepositional phrase, the preposition forms the head of the left constituent (e.g. "Zwischen-den-Mahlzeiten-Imbisse", Lawrenz 1996).
All other elements within the left constituents are attached to the phrasal head with the label member within the CoNLL-U format.

In the ptb-format, they are simply placed within the same bracket as the phrasal head (see section "member").

lemma = "komm-wie-du-bist-Hochzeit"

:small_blue_diamond: (N:cphras (V kommen) (PRON wie) (PRON du) (V sein) (N:cdet (A hoch) (N Zeit)))

graph TD
    A{{N:cphras}} --- B(V <br> kommen)
    A --- C(PRON <br> wie)
    A --- D(PRON <br> du)
    A --- E(V <br> sein)
    A --- F{{N:cdet}}
    F --- G(A <br> hoch)
    F --- H(N <br> Zeit)

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	komm	kommen	V	_	_	6	cphras:N	_	_
2	wie	wie	PRON	_	_	4	member	_	_
3	du	du	PRON	_	_	4	member	_	_
4	bist	sein	V	_	_	4	member	_	_
5	hoch	hoch	A	_	_	6	cdet:N	_	_
6	zeit	Zeit	N	_	_	_	_	_	_

member

In the CoNLL-U format, the label member is used to connect constituents that perform the same function within a word-formation process, e.g. circumfixes in derivation. They have two use cases:

derivation
- Here, the label follows the right-hand head principle. In other words, the right-most member of the circumfixation is the head, whereas the relation label member points towards the members left to it
phrasal compounds (see section "cphras" for reference)
- Here, the label is used for each constituent that is dependent of the phrasal head (the left constituent of the PC)

In the ptb-format, all member-elements are simply placed within the same bracket as their head.

lemma = "Gebremse"

:small_blue_diamond: (der:N (CIRCFX Ge) (V bremsen) (CIRCFX e))

graph TD;
  A{{N:der}}---B(CIRCFX <br> Ge);
  A---C(V <br> bremsen);
  A---D(CIRCFX <br> e);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	Ge	Ge	CIRCFX	_	_	3	member	_	_
2	brems	bremsen	V	_	_	3	der:N	_	_
3	e	e	CIRCFX	_	_	_	_	_	_

mov

The label mov refers to derivational processes differentiating the grammatical gender (cf. Movierung).

lemma = "Fahrerin"

:small_blue_diamond: (N:mov (N:der (V fahren))

graph TD;
  A{{N:mov}}---B{{N:der}};
  B---C(V <br> fahren);
  B---D(NSFX <br> er);
  A---E(NSFX <br> in);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	fahr	fahren	V	_	_	2	der:N	_	_
2	er	er	NSFX	_	_	3	mov:N	_	_
3	in	in	NSFX	_	_	_	_	_	_

trans

The label trans is used in case there is a morphosyntactic reclassification without a semantic reclassification. A further criterion here is that the morpheme form is not changed, i.e. it is not used for conversions into the stem form (see section "conv") and for affixations or stem vowel changes (see section "der").

In opposition to the Falko guidelines, trans is also ([-currently-]) used if the noun resembles an infinitive but contrary to actual transpositions it has masculine grammatical gender.

Note: this can be systematically checked by searching for all masculine nouns part of the case annotations of P5/P11
- search query: canon:Gender=‎"Masc‎" _o_ pos_lang=‎/N.‎/ _o_ lemma

trans:A

lemma = "feind"

:small_blue_diamond: (A:trans (N Feind))

graph TD;
  A{{A:trans}}---B(N <br> Feind);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	feind	Feind	N	_	_	_	_	_	preop=trans:A

trans:N

lemma = "Überqueren"

:small_blue_diamond: (N:trans (V:der (VPFX über) (V queren)))

graph TD;
  A{{N:trans}}---B{{V:der}};
  B---C(VPFX <br> über);
  B---D(V <br> queren);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	über	über	VPFX	_	_	2	der:V	_	_
2	queren	queren	V	_	_	_	_	_	postop=trans:N

trans:V

(to do or to delete)

between inflection and derivation

We treat forms where it is unclear if we are dealing with an inflectional or derivational process as processes of word-formation separate from derivation.

adjectives: comparative & superlative

Here, we use the labels comp for the comparative and sup for the superlative form.

lemma = "Liebste"

:small_blue_diamond: (N:trans (A:sup (A lieb)))

graph TD;
  A{{N:trans}}---B{{A:sup}};
  B---C(A <br> lieb);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	liebste	lieb	A	_	_	_	_	_	preop=sup:A,trans:N

participle: present & past

Here, we use the labels PPres for the present participle and PPast for the past participle.

lemma = "Fahrende"

:small_blue_diamond: (N:trans (A:trans (V:PPres (V fahren)))

graph TD;
  A{{N:trans}}---B{{A:trans}};
  B---C{{V:PPres}};
  C---D(V <br> fahren);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	fahren	fahren	V	_	_	_	_	_	preop=PPres:V,trans:A,trans:N

lemma = "Unfallverursachende"

:small_blue_diamond: (N:trans (A:cdet (N Unfall) (A:trans (V:PPres (V verursachen))))

graph TD;
  A{{N:trans}}---B{{A:cdet}};
  B---E(N <br> Unfall);
  B---C{{A:trans}};
  C---D{{V:PPres}};
  D---F(V <br> verursachen);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	unfall	Unfall	N	_	_	2	cdet:A	_	_
2	verursachende	verursachen	V	_	_	_	_	_	preop=PPres:V,trans:A\|postop=trans:N

lemma = "Reingefahrene"

:small_blue_diamond: (N:trans (A:trans (V:PPast (V:cdet (VPART rein) (V fahren)))))

graph TD;
  A{{N:trans}}---B{{A:trans}};
  B---C{{V:PPast}};
  C---D{{V:cdet}};
  D---E(VPART <br> rein);
  D---F(V <br> fahren);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	rein	rein	VPART	_	_	2	cdet:V	_	_
2	gefahrene	fahren	V	_	_	_	_	_	postop=PPast:V,trans:A,trans:N

Special Cases

Here are some cases where we have to pay attention when annotating.

diminutives (work in progress, to be discussed)

Diminutives are a type of derivation, see grammis: Systematische Grammatik/.../Die Transposition:

Der Gegenfall zur Transposition ist die Determination: In Derivaten wie Kindchen, gelblich, hypergemein wird determiniert. So bestimmt z.B. das Suffix -chen die Basis Kind semantisch näher als 'Kind, und zwar ein besonders kleines, liebenswertes, niedliches'. >>>

It may be that the stem vowel changes in this derivational process. This is indicated in column 2 & 3.

lemma = "Pünktchen"

:small_blue_diamond: (N:der (N Punkt) (SFX chen))

graph TD;
  A{{N:der}}---B(N <br> Punkt);
  A---C(SFX <br> chen);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	pünkt	Punkt	N	_	_	2	der:N	_	_
2	chen	chen	SFX	_	_	_	_	_	_

non-native WF (work in progress, to be discussed)

To the best of our ability, we annotate this with the label set of native WF and by using the German orthography instead of the non-native one. See, for instance, the deverbal noun "Reaktion": We use the verb lemma "agieren" instead of "agere (āctum)".

lemma = "Reaktion"

:small_blue_diamond: (N:der (V:der (VPFX re) (V agieren)) (NSFX ion))

graph TD;
  A{{N:der}}---B{{V:der}};
  B---C(VPFX <br> re);
  B---D(V <br> agieren);
  A---E(NSFX <br> ion);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	re	re	VPFX	_	_	2	der:V	_	_
2	akt	agieren	V	_	_	3	der:N	_	_
3	ion	ion	NSFX	_	_	_	_	_	_

prefixed deverbal noun, yet not a prefix verb

lemma = "Rücksicht"

:small_blue_diamond: (N:cdet (PFX rück) (N:der (V sehen)))

graph TD;
  A{{N:cdet}}---B(PFX <br> rück);
  A---C{{N:der}};
  C---D(V <br> sehen);

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5:empty	6: empty	7: head ID	8: type of WF process	9: empty	10: pre- & post-operations
1	rück	zurück	PFX	_	_	2	cdet:N	_	_
2	sicht	sehen	V	_	_	_	_	_	preop=der:N

Appendix H: Helpful Links

Digitales Wörterbuch der deutschen Sprache
- helpful for looking up the morphological composition of words
  - contains a hyperlink to each morphological component, reports the semantic(s) of the word, etc.
- tied to the etymological dictionary of German (see link below)
Etymologisches Wörterbuch des Deutschen
- helpful for conducting etymological research of word forms
  - e.g. concerning morphological composition, function, origin

Appendix T: Technical details

Annotation Layers

In CoNLL-U

The following describes the function of the 10 columns in the CoNLL-U format specifically for our annotation format.

1 – ID
2 – allomorph
- this refers to the form of the morpheme
- automatically derived and needs correction
3 – morpheme
- this refers to the lemma of the morpheme
- automatically derived from SMOR, usually no correction necessary
4 – category of the morphemes
5 – category of the formation of all joint dependents (empty)
- this is only for the automatic part of the annotation. we can disregard this column in the manual annotation process
6 – features (empty)
7 – head
- ID of the formation head, i.e. the ID of the morpheme of which the morpheme of this row is a dependent of
- both the head and the dependent are part of the same word-formation process
8 – the type of word-formation process
- derivation
- determinative compounds + cdet also includes the particle verb formation process!
- other compounds
  - ccop & cphras
- member
  - for circumfixation
    - head: suffix
    - dependent: prefix
- deep feature @flat
  - if there is no clear head-dependent structure, this label needs to be added to the word-formation process label
- the other WF processes (conv, trans, der) are annotated in column 10 since they involve transformations of the morpheme itself and not its composition as a dependent with other morphemes
9 – (empty)
10 – pre- & post-operations
- definitely needs to be corrected!
- preop
  - an operation on the morpheme itself before it joins a WF process with other morpheme(s)
- postop
  - operations on the morpheme itself after it joined a WF process with other morpheme(s)
- both pre- and post-operations include the following labels:
  - trans, conv, der
- multiple labeling of operations is possible and are separated with commas
  - the order is relevant!
- if there exist both pre- and post-operations for the same morpheme, they are separated with the pipe-symbol “|”

The conll-U annotation format is illustrated in the table below:

lemma = "Augenzeugenbericht"

1: ID	2: allomorph form	3: morpheme lemma	4: morpheme category	5: (empty)	6: (empty)	7: head ID	8: type of WF process	9: (empty)	10: pre- & post-operations
1	auge	Auge	N	_	_	2	cdet:N	_	_
2	zeuge	Zeuge	N	_	_	3	cdet:N	_	_
3	bericht	berichten	V	_	_	_	_	_	preop=conv:N

In PTB

In ptb, the annotations are fairly straightforward. They follow the bracket system of syntactic derivation trees. As such, they have root, branch, and terminal nodes. The morpheme lemmas within the word are portrayed within the terminal nodes, whereas the word-formation processes (incl. pre-/post-operations) constitute the root and branch nodes.

This is illustrated in the diagram below, which corresponds to the following bracketed expression:

lemma = "Tathergang"

:small_blue_diamond: (N:cdet (N Tat) (N:der (V:cdet (VPART her) (V gehen))))

graph TD;
  A{{N:cdet}}---B(N <br> Tat);
  A---C{{N:der}};
  C---D{{V:cdet}};
  D---E(VPART <br> her);
  D---F(V <br> gehen);

Bibliography

[LukassekAkbariLuedeling2023] - Anke Lüdeling, Julia Lukassek, Roodabeh Akbari - Richtlinie zur morphologischen Annotation von Nomina in Falko. - to appear. -

Summary/Abstract

N/A

Preliminary Guidelines on Noun Word Formation in the RUEG Corpus (German)