German Normalisation
(in German)
Basics
The RUEG-Korpus' general guidelines for normalisaton apply: Step 2: Normalisation
In the following, a few principles will be repeated and language specific decisions included.
- orthographic normalisation
- no normalisation in the syntax
- no grammatical normalisation
- discontinuities and pauses are erased and receive an empty event (on language level the value for discontinuities and pauses is erased -> empty event); EXCEPT word internal discontinuities (dipl: vorbeige$le$ rollt, norm: vobeigerollt)
- repetitions remain
- spoken: non-verbal material, such as [laughing], is not transferred -> empty event
- punctuation marks are not included
normalisation of pronunciation phenomena
- this counts also for the written files - (change 7.10.2019)
reductions of determiners, adjectives and nouns are not normalised with respect to case and gender information*:
| dipl | norm |
|---|---|
| schön guten tag | schön guten Tag |
| mit ein hund | mit ein Hund |
| so ein klein hund | so ein klein Hund |
reductions and slip of the tongues of verbs, nouns etc. not related to case and gender marking are normalised:
| dipl | norm |
|---|---|
| is | ist |
| Umfall | Unfall |
| gesprung | gesprungen |
short forms of indefinite articles are normalised as:
| dipl | norm |
|---|---|
| n | ein, einen |
| nen | ein, einen |
| ne | eine |
| eim, nem | einem |
-
normalise 'nen' as 'ein' in cases of nominative masculine and nominative/accusative neuter, e.g.:
- dipl: "da is nen auto um die ecke gebogen". norm: "da ist ein auto..."
- dipl: "ich hab nen auto gesehen". norm: "ich habe ein auto..." BUT not in: dipl: "nen ne vollbremsung" as norm: "einen eine vollbremsung"
-
so=n either as "so ein" or as "so einen", depending on the context:
- dipl: "so=n typ hat mitm ball...". norm: "so ein Typ hat mitm Ball"
- dipl: "die frau hat so=n hund dabei". norm: "die Frau hat so einen Hund dabei"
According to the Duden and scholarly literature (e.g. Vogel 2006, Schäfer & Sayatz 2014), 'n' and 'nen' can each represent both ('ein', 'einen'). One normalises according to the principle of minimal deviation from the standardly expected form.
hesitation markers / filled pauses
- hesitation markers are all normalised as „äh“, these include äh, ähm, öh, hm etc.
| dipl | norm |
|---|---|
| äh, öh, ähm, hm etc. | äh |
no lexical changes
- when meaning is clearly constant, determine and document a standard, such as:
| dipl | norm |
|---|---|
| aufgrund, auf Grund | aufgrund |
| andren,anderen | anderen |
| bro, brother | brother (lang=eng) |
| Dicker, Digger | Dicker (29.05.2019) |
| etwas, was | etwas |
| grad, grade, gerade | gerade |
| gern, gerne | gerne |
| habe, hab | habe |
| hey, hi, hei (as greeting, not as outcry) | hi |
| langlaufen | entlanglaufen |
| mache , mach (imperative) | mach |
| nichts, nix | nichts |
| noch mal, nochmal | nochmal (28.05.2019) |
| rumspielen | herumspielen |
| rum | herum |
| runterfallen | herunterfallen |
| sodass, so dass (when conjunction) | sodass |
| vorn, vorne | vorne |
- when change of meaning is possible or when context is restricted, leave lexemes as they are, the variations remain:
- daran, dran
- darin, drin, drinnen
- drauf, darauf
- sone (as in "sone autos", so only for plural nouns), solche
- reinfahren, hereinfahren, hineinfahren
- auffahren, rauffahren, drauffahren
- reinpacken, einpacken, hineinpacken
Foreign Language Material (FM) and translingual elements
- FM with German inflection, e.g.:
| dipl | norm | lang |
|---|---|---|
| gecrasht | gecrasht | eng/deu |
- material that is included in the Duden, such as sorry, Van etc. are marked as deu on language level. The Online-Duden serves as a reference, the date of the viewing must be documented (list FM).
numbers...
- until twelve: spelled out
- beginning with 13: numerals
- in the written texts, keep the variation the subject chose
individual choices
| dipl | norm |
|---|---|
| pekawe | PKW |
| ef sechzehn | F16 |
- gender gap
- dipl: Fußgänger innen; norm: Fußgänger_innen
Language Values
| dipl | norm |
|---|---|
| deu | deutsch |
| eng | englisch |
| ara | arabisch |
| tur | türkisch |
| spa | spanisch |
written texts
-
include CU level
-
'dass' as conjunction if spelled 'das' is normalised to 'dass'
-
punctuation marks:
- do not add any, do not correct any, except when missing a space:
dipl norm eingepackt.auf eingepackt / . / auf - several punctuation marks one behind the other without space, e.g. three dots: … leave in an event
dipl norm ... / ... / - if there is a space in between, then also leave it, e.g. . /. / .
dipl norm . . . /. / . / . / -
Emojis
- include Emojis such as :) on norm
-
abbreviations/acronyms
- conventionalised abbreviations are left on norm
- unconventionalised abbreviations or acronyms are spelled out, e.g. dipl: kp norm: kein | Plan
- acronyms that are also "action words" (e.g., lol) are left this way on norm