German Normalisation
(in German)
Basics
The RUEG-Korpus' general guidelines for normalisaton apply: Step 2: Normalisation
In the following, a few principles will be repeated and language specific decisions included.
- orthographic normalisation
- no normalisation in the syntax
- no grammatical normalisation
- discontinuities and pauses are erased and receive an empty event (on language level the value for discontinuities and pauses is erased -> empty event); EXCEPT word internal discontinuities (dipl: vorbeige$le$ rollt, norm: vobeigerollt)
- repetitions remain
- spoken: non-verbal material, such as [laughing], is not transferred -> empty event
- punctuation marks are not included
normalisation of pronunciation phenomena
- this counts also for the written files - (change 7.10.2019)
reductions of determiners, adjectives and nouns are not normalised with respect to case and gender information*:
dipl | norm |
---|---|
schön guten tag | schön guten Tag |
mit ein hund | mit ein Hund |
so ein klein hund | so ein klein Hund |
reductions and slip of the tongues of verbs, nouns etc. not related to case and gender marking are normalised:
dipl | norm |
---|---|
is | ist |
Umfall | Unfall |
gesprung | gesprungen |
short forms of indefinite articles are normalised as:
dipl | norm |
---|---|
n | ein, einen |
nen | ein, einen |
ne | eine |
eim, nem | einem |
-
normalise 'nen' as 'ein' in cases of nominative masculine and nominative/accusative neuter, e.g.:
- dipl: "da is nen auto um die ecke gebogen". norm: "da ist ein auto..."
- dipl: "ich hab nen auto gesehen". norm: "ich habe ein auto..." BUT not in: dipl: "nen ne vollbremsung" as norm: "einen eine vollbremsung"
-
so=n either as "so ein" or as "so einen", depending on the context:
- dipl: "so=n typ hat mitm ball...". norm: "so ein Typ hat mitm Ball"
- dipl: "die frau hat so=n hund dabei". norm: "die Frau hat so einen Hund dabei"
According to the Duden and scholarly literature (e.g. Vogel 2006, Schäfer & Sayatz 2014), 'n' and 'nen' can each represent both ('ein', 'einen'). One normalises according to the principle of minimal deviation from the standardly expected form.
hesitation markers / filled pauses
- hesitation markers are all normalised as „äh“, these include äh, ähm, öh, hm etc.
dipl | norm |
---|---|
äh, öh, ähm, hm etc. | äh |
no lexical changes
- when meaning is clearly constant, determine and document a standard, such as:
dipl | norm |
---|---|
aufgrund, auf Grund | aufgrund |
andren,anderen | anderen |
bro, brother | brother (lang=eng) |
Dicker, Digger | Dicker (29.05.2019) |
etwas, was | etwas |
grad, grade, gerade | gerade |
gern, gerne | gerne |
habe, hab | habe |
hey, hi, hei (as greeting, not as outcry) | hi |
langlaufen | entlanglaufen |
mache , mach (imperative) | mach |
nichts, nix | nichts |
noch mal, nochmal | nochmal (28.05.2019) |
rumspielen | herumspielen |
rum | herum |
runterfallen | herunterfallen |
sodass, so dass (when conjunction) | sodass |
vorn, vorne | vorne |
- when change of meaning is possible or when context is restricted, leave lexemes as they are, the variations remain:
- daran, dran
- darin, drin, drinnen
- drauf, darauf
- sone (as in "sone autos", so only for plural nouns), solche
- reinfahren, hereinfahren, hineinfahren
- auffahren, rauffahren, drauffahren
- reinpacken, einpacken, hineinpacken
Foreign Language Material (FM) and translingual elements
- FM with German inflection, e.g.:
dipl | norm | lang |
---|---|---|
gecrasht | gecrasht | eng/deu |
- material that is included in the Duden, such as sorry, Van etc. are marked as deu on language level. The Online-Duden serves as a reference, the date of the viewing must be documented (list FM).
numbers...
- until twelve: spelled out
- beginning with 13: numerals
- in the written texts, keep the variation the subject chose
individual choices
dipl | norm |
---|---|
pekawe | PKW |
ef sechzehn | F16 |
- gender gap
- dipl: Fußgänger innen; norm: Fußgänger_innen
Language Values
dipl | norm |
---|---|
deu | deutsch |
eng | englisch |
ara | arabisch |
tur | türkisch |
spa | spanisch |
written texts
-
include CU level
-
'dass' as conjunction if spelled 'das' is normalised to 'dass'
-
punctuation marks:
- do not add any, do not correct any, except when missing a space:
dipl norm eingepackt.auf eingepackt / . / auf - several punctuation marks one behind the other without space, e.g. three dots: … leave in an event
dipl norm ... / ... / - if there is a space in between, then also leave it, e.g. . /. / .
dipl norm . . . /. / . / . / -
Emojis
- include Emojis such as :) on norm
-
abbreviations/acronyms
- conventionalised abbreviations are left on norm
- unconventionalised abbreviations or acronyms are spelled out, e.g. dipl: kp norm: kein | Plan
- acronyms that are also "action words" (e.g., lol) are left this way on norm