Annotation Step 1: Transcription
Anonymisation
Anonymisation on Speaker tier
-
Replace name of participant with the respective speaker code, e.g. USbi02FR
-
If whole names or surnames of friends are mentioned, replace with the participant code + _P, e.g. USbi02FR_P
-
Places that could lead to the identification of a participant, like streetnames, schoolnames, etc.:
- "Friedrichstraßezzz", you transcribe as "{streetname}Straße". "zzz" has to be replaced by any inflectional suffixes/erase it if there are no such suffixes
- "Apple Highschools“ (with plural suffix), you transcribe as "{schoolname}schools".
! Attention: There should be no spaces following the {...}. ! Over time a list of these placeholders should be developed by every project
Segmentation
- Communication Unit (CU) is used as a segmentation unit
- No punctuation marks at all on the transcription layer
- No accents, no intonation patterns are marked
- In Exmaralda: blank space at the end of each event (* no punctuation marks on norm layer)
Our decisions to CU segmentation you find here: Decisions CU Segmentation
Spelling
- No capital letters
- Abbreviations/acronyms are transcribed as full words in the phonology of the language heard in the recording(e.g. German BMW = beemwe, English BMW = beemdoubleyou)
- speaker codes need to be partly capitalized to follow their correct pattern
Transcription
Adapted from KiDKo2014
'Unwanted' material
- 'unwanted' material are questions of participants concerning the procedure and eventual responses from the elicitator
- first, figure out if you can exclude this kind of data with 'unwanted' material and repeat the elicitation
- If this is not possible, mark those passages as:
<Q> communication with elicitor </Q>
- they get an extra-event
Merged forms
- Merged forms are transcribed as they are articulated, but with an equal sign linking the merged elements
- Examples: so=ne (= so eine)
Reduced syllables
- reduced syllables are transcribed as articulated
- Examples: goin (= going), bi tane (= bir tane), hab ich ein Tadel bekommen (= einen Tadel)
Elisions, repetitions and interruption
- Do not leave anything out and do not add anything which is not there!
- Use / to mark unfinished words, e.g. “The bl/ blue car crashes um stops“
- word internal cancellations/corrections are transcribed as follows: dipl: "be$ha$ come" (norm: "become")
- Onomatopoeias/echoisms are separated tokens (e.g. gutschi gutschi gutschi), only transcribed as one single token if they are very short (e.g. eieiei)
Pauses
- always measured to the first decimal
- 0.2 - 1 sec: (-)
- 1 - 3 secs: (--)
- More than 3 secs: (5.5) to be measured
- Wordinternal pauses are marked as followed: be(-)have 1
- keep in mind that there might be persons who talk really slowly (makes no sense to put a pause after every word/token)
- pauses inside a CU do not get an extra-event on the CU tier
- pauses which occur between two CUs get an extra-event on the CU tier
Long vocals & consonants
- vocals realized longer than normal (0.2 - 2sec) are marked with : (e.g. so lo:ng)
- vocals that are realized longer tha 2 seconds are marked with :: (z.B. so lo::ng)
- also possible for consonants (e.g. mum:)
- doubling of vocal syllables with % (e.g. by%ye, tschü%üß)
Non-verbal material
- non-verbal events such as a participant laughing or coughing are noted in square brackets on the transcription tier, e.g. [laughing]
- if participants speak and laugh at the same time, you note it as: [[laughing]speech]
List of meta comments used in the RUEG project
- [coughing]
- [gulping]
- [laughing]
- [pfing] for a sound like "pfff"
- [sighing]
- [throatclearing]
- [tongueclicking], including tsking as disapproval, while thinking and just mouth opening with a click
- [whispering]
- [stuttering]
- [imitating], for when they imitate a sound related to the story (e.g., car crash)
- [sniffing]
Uninterpretable material
- uninterpretable material is to be marked as (UNK) on Speaker-tier
- longer than 2secs: (UNK, 2.1)
- assumed content is placed in between brackets, each token separated: (assumed) (content)
- if the uninterpretable material can be identified as belonging to a CU, there is no seperated event on the CU tier for it
Hesitation markers / Interjections / Reception markers
- For every language, we define a set of hesitation markers/interjections/reception markers
- create a list with those markers
- If heritage speakers use particles from their ‘other’ language, we transcribe them as they sound, consistent with the procedure on foreign language material2
Foreign language material
- Choose a spelling for each item following one of those options:
- transcribe phonographically (e.g. engl. like = germ. leik) OR
- use orthographic spelling of the "other" language
- Create a list where you document the spelling of each item in alphabetic order
- put the file name that includes the word and the time of the appearance in the list
- each time you encounter foreign language material in your data, check the list to guarantee a consistent form for those items3
Proper/Brand names from "foreign language"
- Keep conventionalized spelling (e.g. Renault = renault)
- document your decisions, create a list with those items
- Language specific decisions: Russian: put it in the spelling and script of the actually spoken language to avoid loss of phonetic/morphological/syntactic information Turkish and Greek: use Latin alphabet and conventionalized spelling
Table of symbols
Symbols | Meaning |
---|---|
<Q> communication with elicitor </Q> | instances of questions concerning the procedure and/or verbal interventions of elicitators |
(-) | pauses up to 1sec |
(--) | pauses 1-3secs |
(3.2) | pauses longer than 3secs |
(UNK) | uninterpretable material |
(UNK, 2.2) | uninterpretable material longer than 2secs |
(assumption) | assumed material |
[...] | non-verbal material |
[[...]...] | non-verbal & verbal event |
: | unusually long vocal or consonant (under 2secs) |
:: | unusually long vocal or consonant (longer than 2secs) |
= | merged forms |
/ | interruption of a word |
$...$ | word internal cancellations |
% | doubled syllables |
{...} | specification of an anonymised place |