Transcription Decisions Russian

0. General information

  • no capital letters

  • abbreviations/acronyms are transcribed as full words (e.g. ДТП = дэтэпэ)

  • lower case for all words, even at the beginning of a sentence ==> exceptions: participant code, participant code + _P and symbols like (UNK) etc.

  • the transcription of the participant speech shall generally take place in accordance to the standard orthographic rules of Russian

  • but: if the participant articulates utterances or words, which are not typical neither for the standard Russian nor for the oral vernacular language (повседневный язык) of Russian, transcribe it as it was articulated by the participant

    Example from USbi52MR_fsR:

    Participant: потому что они два два (-) не видели (-) ==> Standard and vernacular Russian: потому что они друг-друга не видели ==> два два is not typical for the standard or oral vernacular Russian ==> Transcription: потому что они два два (-) не видели (-)

    Example:

    Participant: мужик играл с футболом ==> standard and vernacular Russian: мужик играл с мячом ==> с футболом in this context is not typical neither for the standard nor for the vernacular Russian ==> Transcription: мужик играл с футболом

  • typical phenomena for standard and vernacular Russian, which shouldn't be transcribed:

    • reduced vokals (if it is not a special dialect of Russian)

      Example:

      Participant: ана талкает каляску, а мужык играет смячикам ==> Transcription: она толкает коляску, а мужик играет с мячиком

    • so called phonetic words (= words, which are articulated as one word ==> it often concerns prepositions and the following noun)

      Example:

      Participant: он вышел издому ==> Transcription: он вышел из дому

1. Tiers

  • two tiers should be used
      1. tier = speaker tier ==> is only used for the transcription of the participant speech; it gets marked with the participant code
      1. tier = comment tier ==> the comment tier is an optional tier and used for communication between transcribers; later (that means: after the transcription) the comment tier will be deleted

2. Segmentation

  • NB:

    • 1 independent/main clause (главное предложение) = 1 simple sentence (простое предложение)
    • 1 independent clause (главное предложение) + 1 or more dependent clauses (придаточное предложение) = complex sentence (сложноподчинённое предложение)
    • 1 independent clause (главное предложение) + 1 or more independent clauses (главное предложение) = compound sentence (сложносочинённое предложение)
  • hint: an independent clause can always stand alone; a dependent clause should never stand alone, because without its independent clause the dependent clause wouldn't make sense

  • in addition: a compound sentence can be easily recognized by certain conjunctions, which connect the independent clauses in that type of sentence: these conjunctions are coordinating (соединительный), adversative (противительный) or disjunctive (разделительный) conjunctions (союзы), such as и, но, а, или, либо...либо etc.

  • dependent clauses in complex sentences can be recognized by conjunctions and relativizers like потому что, когда, что, кто, который, чтобы, так как, но и etc.

  • the participant speech gets segmented in communication units (CUs)

  • 1 CU correlates with 1 simple sentence or with 1 complex sentence; sentences, which consist of more than 1 independent clause (= compound sentence), are in every case more than 1 CU:

    • Simple sentence

      Example from DEbi52FR_isR:

      я стала свидетельницей (-) а: (-) столкновение двух машин | ==> 1 CU

    • Complex sentence

      Example from DEbi52FR_isR:

      виноваты были не машины а: (-) один (-) эм мужчина который (-) ну кот/ ещё более такой (-) молодой | ==> 1 CU

    • Compound sentence

      Example from DEbi03FR_isR:

      хотела с тобой это поделить | но я была здесь на парковке у реве | ==> 2 CU's

  • if a compound sentence includes a VP coordination or an ellipsis, such sentence is to be annotated as one CU:

    Example:

    она вышла из магазина [subject ellipsis] уронила пакет и [subject ellipsis] пошла дальше | ==> 1 CU

  • discourse marker (ну, ну там, вот, так, как бы, получается, эм, то есть etc.) and the following utterance will be seen as one CU

    Example from DEbi03FR_isR:

    ну там я предполагаю там ребёнок внутри был | ==> 1 CU

  • discourse markers (ну, ну там, вот, так, как бы, получается, эм, то есть etc.), which specify the precursory utterance, will be seen as one CU, too

    Example from DEbi02FR_fsR:

    хм их было трое то есть э маленький ребёнок э: женщина и мужчина | ==> 1 CU ("маленький ребёнок э: женщина и мужчина" is a specification of "трое")

  • greetings (привет, здравствуйте, здорово, здрасте etc.) will be defined as an extra CU

    Example from DEbi03FR_isR:

    привет DEbi03FR_P | слушай я сейчас видела здесь такую ситуацию | ==> 2 CU's

  • question tags such as правда?, или?, правильно?, правильно понял?, не так ли? etc. belong to the previous CU

    Example:

    ты вася пупкин, правильно | ==> 1 CU

  • in case you are not sure, make less CU's, to facilitate the SUD annotation

  • keep in mind, that punctuation marks are not used at all ==> that means: no full stops, no commas etc.

3. Anonymization

  • replace the name of the participant with the respective code ==> e.g.: DEbi52FR

  • if whole names or surnames of the participant’s friends are mentioned, replace them with the participant code + _P

    Example from DEbi52FR_isR:

    привет DEbi52FR_P

  • places, that could lead to the identification of the participant, should be replaced as following

    Example:

    я хожу в Leo-Tolstoi-Schule ==> я хожу в {schoolname}шуле

    я живу на улице Шютценштрассе ==> я живу на улице {streetname}штрассе

  • anonymization in Audacity: the name of the participant should be anonymized with the aid of white noise

4. Hesitation markers

  • hesitation markers do not represent an own event ==> they belong to the concerned CU

    Example from DEbi52FR_isR:

    я когда шла э: на автобусную остановку (-) эм: ==> 1 CU

  • general notation:

    • m-hm (confirming) = угу
    • ehm = эм or э:м
    • hm = хм
    • eh/uh = э or э:
    • ɑha = ага
    • ah = а:

5. Long vocals and consonants

  • vocals pronounced longer than normal (under 2 seconds) are marked with a colon ==> e.g.: ну: да
  • vocals that are pronounced extremely long (2 seconds and longer) are marked with two colons ==> e.g.: ну:: да
  • long pronunciation is also possible for consonants ==> e.g.: тс: тише
  • doubling of vocal syllables are marked with % ==> e.g.: ты точно сделал? да%а

6. Pauses

  • are transcribed on speaker tier
  • a pause between two CU´s is marked as an own event ==> the pause gets two boundaries
  • pauses in a CU get transcribed within the concerned CU ==> they do not represent an own event
  • word internal pauses are marked in the words and without a space between the parts of the concerned word ==> e.g.: с э(-)тим мячиком ==> exception: pauses with эм in a word ==> e.g.: они на (-) эм крыли стол
  • general notation:
    • 0.2-1 second ==> (-)
    • 1-3 seconds ==> (--)
    • longer than 3 seconds ==> time should be measured and noted in brackets ==> e.g.: (3.1), (5.5)
  • background noise like traffic noise, phone ringing or computer noise are noted as pauses

7. Merged forms

  • merged forms are transcribed as they are articulated, but with an equal sign linking the merged elements

    Example from USbi52MR_fsR:

    с одной стороны (-) дороги (-), э, шли муж=женой

8. Reduced syllables

  • general rule: reduced syllables should be transcribed in its full length, even if it was differently articulated

    Example:

    participant: она токо что шла на улице ==> transcription: она только что шла на улице

  • exception: if a word can be found with its reduced syllables in a dictionary (e.g. MAC ==> Link ) and the participant articulated the word in its reduced form, then the reduced form of the word should be preferred for transcription

    Example:

    participant: здрасте ==> transcription: здрасте Link zum MAC

  • use / to mark unfinished words

    Example from DEbi52FR_isR:

    сегодня (-) э когда я шла на авто/ астобв/ (-) а (-) автобусную остановку

9. Numerals and dates

  • numbers should be transcribed by words, since Russian numbers are often inclined or morphed

    Example:

    я вижу двух* женщин*

  • dates should be transcribed by words, too

    Example:

    я родился двадцать первого* января тысяча девятьсот девяносто пятого года*

10. Spelling for russified lexicals

  • general rule: foreign words should be transcribed into Russian as they are articulated

  • in addition: for this case exists a special list, where you can search for words like that or add new words Link ==> important: all transcribers have to transcribe these words into Russian eaqually

    • Autowerkstatt = аутоверкштат
    • Truck = трак
    • Ort = орт
    • REWE = реве
    • Renault = рено
    • also = алзо
    • OK_ = окe
    • WhatsApp = воцап
    • {schoolname}schule = {schoolname}шуле
    • {streetname}straße = {streetname}штрассе
  • table for russified lexicals

    German/English word Russified word Code File Second
    accident аксидент USbi06FR fsR 6,94
    accidentally аксидальтально USbi07MR fsR 15,1
    Aldi алди DEbi64MR fsR 8
    also алзо DEbi56FR fsR 36,85
    Autowerkstatt аутоверкштат DEbi51FR fsR 93,75
    bag бег USbi59FR isR 164,23
    Ball (mit dem) болом DEbi12FR fsR 16,9
    Ball бол DEbi12FR fsR 19
    in the back ин зе бэк USbi74MR isR 53,92
    ciao чао DEbi04MR isR 29,4
    case кэйз USbi86FR fsR 4,62
    crash крэш DEbi12FR fsR 47,78
    crashed крэшовали DEbi15MR isR 37,56
    hey хей USbi73FR isR 0,64
    like лайк USbi86Fr fsR 73,13
    message месседж USbi16FR isR 8,23
    911/nine one one найн уон уон USbi59FR isR 83,64
    911/nine eleven найн элэвэн USbi73FR fsR 59,29
    OK окей USbi05FR isR 64,7
    Ort орт DEbi53FR fsR 23,19
    parking lot паркинг лот USbi74MR isR 7,48
    Renault рено DEbi10MR isR 31,71
    representative рэпрезэнтэтиф USbi74MR fsR 3,85
    Rewe реве DEbi03FR isR 13,9
    Schützenstraße Шютценштрассе DEbi04MR fsR 5,14
    spilled/spilt сплыть USbi58FR fsR 36,41
    stopped стопт USbi58FR fsR 25,22
    stroller строллер USbi79MR isR 51,8
    Truck трак USbi52MR fsR 77,39
    turn торн USbi74MR isR 36,32
    Vans вэнс USbi59FR isR 12,59
    WhatsApp воцап USbi52MR isR
  • if already exists a conventionalized spelling in Russian for a foreign word, the conventionalized spelling should be preferred

    Example:

    Messenger = мессенджер

11. Notations of non-verbal material, uninterpretable material and background noise

  • non-verbal events like laughing or coughing are noted in square brackets on speaker tier and always belong to the concerned CU

  • general notation

    • [throatclearing]
    • [coughing]
    • [laughing]
    • [pfing] ==> for a sound like „pff“
    • [sighing]
    • [sniffing]
    • [tongueclicking] ==> including tsking as disapproval, while thinking and just mouth opening with a click
    • [yawning]
    • [gulping]
    • [whispering]
    • [breathing]
  • if the participant speaks and makes a non-verbal event at the same time, it is noted as:

    • [[coughing]word]
    • [[laughing]word]
    • [[sighing]word]
    • [[tisking]word]
    • [[yawning]word]
    • [[gulping]word]
    • [[whispering]word]

    Example from DEbi52FR_isR:

    ты [[laughing]знаешь] что сегодня случилось

  • uninterpretable material is to be marked as (UNK) on speaker-tier

  • if it is not clear, to which CU the UNK belongs, make an own event ==> that means: write the UNK between two boundaries

  • if the UNK is longer than two seconds, measure the time and write the time together with UNK in one bracket ==> e.g.: (UNK, 2.1)

  • assumed content is noted in brackets, each token has to be separated ==> e.g.: (assumed) (content)

  • background noise such as traffic noise, phone ringing or Computer noise should be noted as pauses

12. Table of symbols

Symbol Meaning
<Q> speech </Q> ==> e.g.: <Q> можно я ещё раз </Q> for questions to the procedure on the part of the participant or for verbal interventions on the part of the elicitor
(-) for pauses 0.2-1 second
(--) for pauses 1-3 seconds
(time) ==> e.g.: (3.1) for pauses longer than 3 seconds
(UNK) for uninterpretable material
(UNK, time) ==> e.g.: (UNK, 2.1) for uninterpretable material longer than 2 seconds
(assumed word) for assumed material
[non-verbal action] ==> e.g.: [laughing] for non-verbal material
[[non-verbal action]word] ==> e.g.: [[laughing]знаешь] for a non-verbal & verbal event
: for unusually long vocal or consonant (under 2 seconds)
:: for unusually long vocal or consonant (longer than 2 seconds)
= for merged words
/ for interruption of a word
% for doubled syllables
{...} ==> e.g.: {schoolname}шуле for anonymised places