Annotation Step 1: Transcription
Anonymisation
Anonymisation on Speaker tier
- Replace name of participant with the respective speaker code, e.g. USbi02FR
- If whole names or surnames of friends are mentioned, replace with the participant code + _P, e.g. USbi02FR_P
COMMENT: To our information we anonymize all part of the speaker name, also just the first name.
- Places that could lead to the identification of a participant, like streetnames, schoolnames, etc.:
If the participant names for example "Friedrichstraße", you transcribe "{streetname}Straßezzz". You leave out the actual name of the street, in the example here you replace the "Friedrich". "zzz" has to be replaced by any inflectional suffixes you hear in the data or erased, if there are no inflectional suffixes. In this example you would transcribe"{streetname}Straße", because there are no inflectional suffixes. Accordingly if schools are named, for example "Apple Highschools" ("s" in the end for an English plural form), you transcribe "{schoolname}schools". There should be no spaces following the {...}. The placeholder you put in between the {...} is in English, regardless of the language you elicitate and transcribe in. Over time a list of these placeholders should develop, so that all identifiers can be replaced and put between the {...} ("{streetname}Allee"). This list is the same for every project and language.
Segmentation
- Communication Unit (CU) is used as a segmentation unit
- No punctuation marks at all on the transcription layer
- No accents, no intonation patterns are marked
- In Exmaralda: blank space at the end of each event (* no punctuation marks on norm layer)
Spelling
- No capital letters
- Abbreviations/acronyms are transcribed as full words in the phonology of the language heard in the recording(e.g. German BMW = beemwe, English BMW = beemdoubleyou)
- speaker codes need to be partly capitalized to follow their correct pattern
Transcription
Adapted from KiDKo2014
'Unwanted' material
- 'unwanted' material are questions of participants concerning the procedure and eventual responses from the elicitator
- first, figure out if you can exclude this kind of data with 'unwanted' material and repeat the elicitation
- If this is not possible, mark those passages as:
<Q> communication with elicitor </Q>
- they get an extra-event
Merged forms
- Merged forms are transcribed as they are articulated, but with an equal sign linking the merged elements
- Examples: so=ne (= so eine)
Reduced syllables
- reduced syllables are transcribed as articulated
- Examples: goin (= going), bi tane (= bir tane), hab ich ein Tadel bekommen (= einen Tadel)
Elisions, repetitions and interruption
- Do not leave anything out and do not add anything which is not there!
- Use / to mark unfinished words, e.g. “The bl/ blue car crashes um stops“
- word internal cancellations/corrections are transcribed as follows: dipl: "be$ha$ come" (norm: "become")
- Onomatopoeias/echoisms are separated tokens (e.g. gutschi gutschi gutschi), only transcribed as one single token if they are very short (e.g. eieiei)
Pauses
- always measured to the first decimal
- 0.2 - 1 sec: (-)
- 1 - 3 secs: (--)
- More than 3 secs: (5.5) to be measured
- Wordinternal pauses are marked as followed: be(-)have 1
- keep in mind that there might be persons who talk really slowly (makes no sense to put a pause after every word/token)
- pauses inside a CU do not get an extra-event on the CU tier
- pauses which occur between two CUs get an extra-event on the CU tier
Long vocals & consonants
- vocals realized longer than normal (0.2 - 2sec) are marked with : (e.g. so lo:ng)
- vocals that are realized longer tha 2 seconds are marked with :: (z.B. so lo::ng)
- also possible for consonants (e.g. mum:)
- doubling of vocal syllables with % (e.g. by%ye, tschü%üß)
Non-verbal material
- non-verbal events such as a participant laughing or coughing are noted in square brackets on the transcription tier, e.g. [laughing]
- if participants speak and laugh at the same time, you note it as: [[laughing]speech]
List of meta comments used in the RUEG project
- [coughing]
- [gulping]
- [laughing]
- [pfing] for a sound like "pfff"
- [sighing]
- [throatclearing]
- [tongueclicking], including tsking as disapproval, while thinking and just mouth opening with a click
- [whispering]
- [stuttering]
- [imitating], for when they imitate a sound related to the story (e.g., car crash)
- [sniffing]
Uninterpretable material
- uninterpretable material is to be marked as (UNK) on Speaker-tier
- longer than 2secs: (UNK, 2.1)
- assumed content is placed in between brackets, each token separated: (assumed) (content)
- if the uninterpretable material can be identified as belonging to a CU, there is no seperated event on the CU tier for it
Hesitation markers / Interjections / Reception markers
- For every language, we define a set of hesitation markers/interjections/reception markers
- create a list with those markers
- If heritage speakers use particles from their ‘other’ language, we transcribe them as they sound, consistent with the procedure on foreign language material2
Foreign language material
- Choose a spelling for each item following one of those options:
- transcribe phonographically (e.g. engl. like = germ. leik) OR
- use orthographic spelling of the "other" language
- Create a list where you document the spelling of each item in alphabetic order
- put the file name that includes the word and the time of the appearance in the list
- each time you encounter foreign language material in your data, check the list to guarantee a consistent form for those items3
Proper/Brand names from "foreign language"
- Keep conventionalized spelling (e.g. Renault = renault)
- document your decisions, create a list with those items
- Language specific decisions: Russian: put it in the spelling and script of the actually spoken language to avoid loss of phonetic/morphological/syntactic information Turkish and Greek: use Latin alphabet and conventionalized spelling
Table of symbols
Symbols | Meaning |
---|---|
<Q> communication with elicitor </Q> | instances of questions concerning the procedure and/or verbal interventions of elicitators |
(-) | pauses up to 1sec |
(--) | pauses 1-3secs |
(3.2) | pauses longer than 3secs |
(UNK) | uninterpretable material |
(UNK, 2.2) | uninterpretable material longer than 2secs |
(assumption) | assumed material |
[...] | non-verbal material |
[[...]...] | non-verbal & verbal event |
: | unusually long vocal or consonant (under 2secs) |
:: | unusually long vocal or consonant (longer than 2secs) |
= | merged forms |
/ | interruption of a word |
$...$ | word internal cancellations |
% | doubled syllables |
{...} | specification of an anonymised place |