Russian POS and Lemma
0. General information
Lemmatization
- the term lemma may be defined as the base form of a word
- the base form of a word is the form, you can usually find in a dictionary
- for verbs the base form correlates with the infinitive, for nouns with the nominative, and for adjectives with the nominative in its masculine form
- the conversion of a word into its base form is called lemmatization
- the lemmatization is carried out semi-automatically in the application program EXMARaLDA using two POS- and lemma-taggers U-POS and MyStem, however the accuracy of the taggers should be checked manually each time
- the lemmas or base forms of the words can be found in MyStem on the norm[mystem_lex] and in U-POS on the norm[lemma] layer
POS-Tagging
- the term tagging means that each word of the participant is attributed with its part of speech (POS)
- the tagging is carried out in the application program EXMARaLDA by semi-automatic U-POS and MyStem taggers, but the accuracy of the taggers should be checked manually each time
- there are two taggers in EXMARaLDA, which assume the task of POS-tagging - U-POS and MyStem
- keep in mind, that these two tagging-softwares are similar to each other, but not absolutely identical 1
1. Structure of POS-Tagging in EXMARaLDA
U-POS-Layers
- to the U-POS-software belong the layers from norm[Animacy] to norm[voice] as well as the norm[lemma] and the norm[pos] layer
- each layer in U-POS (and MyStem) correlates with a grammatical category
- the meaning of each grammatical category in U-POS gets explained in the following table:
Layer | Grammatical category | Grammeme | Part of speech |
---|---|---|---|
norm[Animacy] | Одушевлённость | Одушевлённость (Anim); Неодушевлённость (Inan) | concerns only nouns |
norm[Aspect] | Вид | Cовершенный вид [что сделать?] (Perf); Несовершенный вид [что делать?] (Imp) | concerns only verbs |
norm[Case] | Падеж | им.п. (Nom); род.п. (Gen); дат.п. (Dat); вин.п. (Acc); твор.п. (Ins); предл.п. (Loc); зват.п. (Voc) | concerns all nominal categories of POS |
norm[Degree] | Степень сравнения | положительная (Pos); сравнительная (Cmp); превосходная (Sup) | concerns adjectives and adverbs |
norm[Foreign] | иностранное слово | (Yes) | concerns all words, which do not belong to the Russian language |
norm[Gender] | Род | муж.р. (Masc); жен.р. (Fem); сред.р. (Neut) | concerns only nouns, adjectives and pronouns |
norm[Mood] | Наклонение | изъяв.н. (Ind); услов.н. (Cnd); повел.н. (Imp) | concerns only verbs |
norm[Number] | Число | Единственное (Sing); Множественное (Plur) | concerns nouns, adjectives, personal pronouns and verbs |
norm[Person] | Лицо | Первое лицо (1); Второе лицо (2); Третье лицо (3) | concerns personal pronouns and verbs |
norm[Tense] | Время | Настоящее (Pres); Прошедшее (Past); Будущее (Fut) | concerns verbs and participles |
norm[VerbForm] | Форма глагола | Неопределённая форма глагола (Inf); Финитная форма глагола (Fin); Причастие (Part); Деепричастие/Герундий (conv) | concerns verbs |
norm[voice] | Залог | Действительный (Act); middle voice (Mid); Страдательный (Pas) | concerns verbs and participles |
norm [lemma] | Base form of a word (Начальная форма слова) | ------ | concerns all parts of speech |
norm[pos] | POS-Determination of the given word according to UPOS principles | существительное (NOUN); глагол (VERB); прилагательное (ADJ); determiner (DET) [abandon in all cases] ... | concerns all parts of speech |
norm[Reflex] | Real reflexive verbs (настоящие возвратные глаголы) 2 | (Yes) | concers verbs and participles |
MyStem-Layers
- to the MyStem-tagger belong the norm[mystem_gr] and the norm[mystem_lex] layers
- each layer in MyStem (and U-POS) correlates with a grammatical category
- the meaning of each grammatical category in MyStem can be explained as in the following table:
Layer | Grammatical category | Grammeme | Part of speech |
---|---|---|---|
norm[mystem_gr] | POS-Determination of the given word according to MyStem principles | Every redundant grammeme on this layer gets deleted, except the first grammeme and - if they appear - the grammeme of transitivity (tran/intr) 3 and parenthesis (parenth) | concerns all parts of speech |
norm[mystem_lex] | Base form of a word | should conform with the base form in U-POS | concerns all parts of speech |
2. The subjects of lemmatization and POS-Tagging are ...
- ... files from DEbi---R; USbi---R and RUmo---R with following symbols at the end:
- _fsR (formal spoken Russian)
- _fwR (formal written Russian)
- _isR (informal spoken Russian)
- _iwR (informal written Russian)
3. Steps of procedure
- 1. step: Push/Pull/Fetch in GitHub
- 2. step: Open EXMARaLDA Partitur-Editor
- 3. step: File ==> Open ==> rueg repository ==> GitHub (or SmartGit) ==> rueg-corpus ==> exb ==> P3 ==> 1, 2, 3 …
- 4. step: Verify if the CUs in every file correlate with the CU-guidelines - if not, please correct it
- 5. step: Verify if every word correlates with its right language on the dipl[language]-layer - if not, please correct it
- 6. step: POS-Tagging ==> verify the accuracy of the POS-Tagging-softwares (U-POS and MyStem)
- 7. step: Delete all features from the norm[mystem_gr]-layer except the first one and - if available - the features of transitivity, parenthesis and other features which are not redundant with U-POS features
- 8. step: Save your results
- 9. step: Go to GitHub (SmartGit) ==> submit your file ==> push/pull/fetch -> commit
4. Tagging-Guidelines and problems
Phenomenon/Problem | Solution | Example | |
---|---|---|---|
participant code | dipl[language]: rus; norm[Foreign]: Yes; norm[mystem_gr]: S, persn; norm[mystem_lex]: USbi05FR; norm[lemma]: USbi05FR; norm[pos]: PROPN; all other grammemes on UPOS-layers get deleted | здравствуйте меня зовут USbi05FR | |
emojis | dipl[language]: rus; norm[pos]: SYM; all other grammemes on UPOS-layers get deleted | ----- | |
foreign words, e.g. english words: examine each grammatically e.g. анд | dipl[language]: eng; norm [Foreign]: Yes; norm[mystem_gr]: CONJ; norm[pos]: CCONJ; norm[mystem_lex]:анд; norm[lemma]:анд | and = анд | |
items, e.g. English items: examine each grammatically e.g. а(н) | dipl[language]: eng; norm [Foreign]: Yes; norm[mystem_gr]: ANUM; norm[mystem_lex]:а(н); norm[lemma]:а(н); norm[pos]: DET [abandon in all cases] | a(n) = а(н) | |
слова с буквой ё | ё пишется на всех уровнях, кроме на уровне dipl ==> на уровне dipl ничего не изменяется ==> norm[norm]: …ё…; norm[lemma]: …ё…; norm[mystem_lex]: …ё… | ----- | |
ага | norm[mystem_gr]: PART; norm[mystem_lex]: ага; norm[lemma]: ага; norm[pos]: PART; all other grammemes on UPOS-layers get deleted | ----- | |
блин | norm[mystem_gr]: INTJ; norm[mystem_lex]: блин; norm[lemma]: блин; norm[pos]: INTJ; all other grammemes on UPOS-layers get deleted | ну блин | |
быстро | norm[Degree] Pos 4; norm[mystem_gr]: ADV; norm[mystem_lex]: быстро; norm[lemma]: быстро; norm[pos]: ADV; all other grammemes on UPOS-layers get deleted | эта машина очень быстро ехала | |
быть | norm[Aspect]: Imp; norm[Gender]: Fem, norm[Mood]: Ind; norm[Number]: Sing; norm[Tense]: Past; norm[VerbForm]: Fin; norm[Voice]: Act; norm[mystem_gr]: V,intr; norm[mystem_lex]: быть; norm[lemma]: быть; norm[pos]: AUX 5 | она была уверена | |
быть | norm[Aspect]: Imp; norm[Gender]: Fem; norm[Mood]: Ind; norm[Number]: Sing; norm[Tense]: Past; norm[VerbForm]: Fin; norm[Voice]: Act; norm[mystem_gr]: V,intr; norm[mystem_lex]: быть; norm[lemma]: быть; norm[pos]: VERB 6 | там была собака | |
весь | norm[Case]: Gen; norm[Gender]: Fem; norm[Number]: Sing; norm[mystem_gr]: APRO 7; norm[mystem_lex]: весь; norm[lemma]: весь; norm[pos]: PRON | от всей души; что скажешь к всему этому | |
вообще | norm[mystem_gr]: ADV,parenth; norm[mystem_lex]: вообще; norm[lemma]: вообще; norm[pos]: ADV; all other grammemes on UPOS-layers get deleted | ну вообще там была ещё одна машина | |
вот in function to replace something | norm[mystem_gr]: ADVPRO; norm[mystem_lex]: вот; norm[lemma]: вот; norm[pos]: ADV; all other grammemes on UPOS-layers get deleted | вот он идёт | |
вот in function of a modal particle | norm[mystem_gr]: PART; norm[mystem_lex]: вот; norm[lemma]: вот; norm[pos]: PART; all other grammemes on UPOS-layers get deleted | вот а потом мальчик побежал за мячом | |
врезаться | norm[Aspect]: Perf; norm[Gender]: Fem; norm[Mood]:Ind; norm[Number]: Sing; norm[Tense]: Past; nomr[VerForm]: Fin; norm[Voice]: Mid; norm[mystem_gr]: V, intr; norm[mystem_lex]: врезаться; norm[lemma]: врезаться; norm[pos]: VERB; norm[Reflex]: Yes; all other grammemes on UPOS-layers get deleted | одна машина врезалась в другую | |
вроде | norm[mystem_gr]: PART; norm[mystem_lex]: вроде; norm[lemma]: вроде; norm[pos]: PART; all other grammemes on UPOS-layers get deleted | вроде никто не пострадал | |
всё (ещё, равно) | norm[Case]: Nom; norm[Gender]: Neut; norm[Number]: Sing; norm[mystem_gr]: APRO; norm[mystem_lex]: всё; norm[lemma]: всё; norm[pos]: PRON | это всё; всё равно; всё ещё | |
всё-таки | norm[mystem_gr]: PART; norm[mystem_lex]: всё-таки; norm[lemma]: всё-таки; norm[pos]: PART; all other grammemes on UPOS-layers get deleted | он всё-таки поступил по-своему | |
всё-таки after conjuctions и, а, но | norm[mystem_gr]: CONJ; norm[mystem_lex]: всё-таки; norm[lemma]: всё-таки; norm[pos]: SCONJ; all other grammemes on UPOS-layers get deleted | как ни крути, а всё-таки придётся решить эту проблему | |
да | norm[mystem_gr]: PART, parenth; norm[mystem_lex]: да; norm[lemma]: да; norm[pos]: PART; all other grammemes on UPOS-layers get deleted | да так всё произошло | |
давай | norm[Aspect]: Imp; norm[Mood]:Imp; norm[Number]: Sing; norm[Person]: 2; nomr[VerForm]: Fin; norm[Voice]: Act; norm[mystem_gr]: V,tran; norm[mystem_lex]: давать; norm[lemma]: давать; norm[pos]: VERB; all other grammemes on UPOS-layers get deleted | давай | |
два | norm[Case]: Nom; norm[Gender]: Fem; norm[mystem_gr]: NUM 8; norm[mystem_lex]: два; norm[lemma]: два; norm[pos]: NUM | стукнулись две машины | |
должен, должна, должно, должны | norm[Gender]: Masc; norm[Number]: Sing; norm[Variant]: Short; norm[mystem_gr]: A, praed; norm [mystem_lex]: должен; norm[lemma]: должен; norm[pos]: ADJ; all other grammemes on UPOS-layers get deleted | он должен был позвонить в полицию, но в конце не звонил | |
другой | norm[Case]: Acc; norm[Gender]: Fem; norm[Number]: Sing; norm[mystem_gr]: APRO 9; norm[mystem_lex]: другой; norm[lemma]: другой; norm[pos]: ADJ | одна машина врезалась в другую | |
ДТП (дорожно-транспортное происшествие) | norm[Animacy]: Inan; norm[Case]: Gen; norm[Gender]: Neut (because of происшествие); norm[Number]: Sing; norm[mystem_gr]: S,abbr; norm[mystem_lex]: ДТП; norm[lemma]: ДТП; norm[pos]: PROPN | я стал свиделем ДТП | |
его, её, их as possessive pronouns | norm[case]: Gen; norm[Gender]: Fem; norm[number]: Sing; norm[Person]:3; norm[mystem_gr]: SPRO; norm[mystem_lex]: она; norm[lemma]: она; norm[pos]: PRON | он уронил её пакет | |
ехавший | norm[Aspect]: Imp; norm[Case]: Nom; norm[Gender]: Masc; norm[Number]: Sing; norm[Tense]: Past; norm[VerbForm]: Part; norm[Voice]: Act; norm[mystem_gr]: V, intr; norm[mytem_lex]: ехать; norm[pos]: VERB; all other grammems on UPOS-laysers get delated | второй водитель ехавший сзади не успел притормозить | |
ещё | norm[mystem_gr]: ADV; norm[mystem_lex]: ещё; norm[lemma]: ещё; norm[pos]: ADV; all other grammemes on UPOS-layers get deleted | там ещё стояла женщина рядом с машиной | |
женат | norm[Gender]: Masc; norm[Number]: Sing; norm[Variant]: Short; norm[mystem_gr]: A, praed; norm[mystem_lex]: женатый; norm[lemma]: женатый; norm[pos]: ADJ; all other grammemes on UPOS-layers get deleted | он видимо женат | |
заезжая | norm[Aspect]:Imp; norm[Tense]:Pres; norm[VerbForm]:Conv; norm [Voice]: Act; norm[mystem_gr]:V,intr,ger; norm[mystem_lex]: заезжать; norm[lemma]:заезжать; norm[pos]:VERB; all other grammemes on UPOS-layers get deleted | одновременно заезжая пара машин | |
здравствуйте, пока, привет | norm[mystem_gr]: INTJ; norm[mystem_lex]: здравствуйте; norm[lemma]: здравствуйте; norm[pos]: INTJ; all other grammemes on UPOS-layers get deleted | здравствуйте я звоню по поводу | |
здрасте, приветик | norm[mystem_gr]: INTJ, inform; norm[mystem_lex]: здрасте; norm[lemma]: здрасте; norm[pos]: INTJ; all other grammemes on UPOS-layers get deleted | здрасте я звоню по поводу | |
значит as вводное слово | norm[Aspect]: Imp; norm[Mood]: Ind; norm[Number]: Sing; norm[Person]: 3; norm[Tense]: Pres; norm[VerbForm]: Fin; norm[Voice]: Act; norm[mystem_gr]: V, parenth, tran; norm[mystem_lex]: значить; norm[lemma]: значить; norm[pos]: VERB ; all other grammemes on UPOS-layers get deleted | значит он уронил всё и пошёл | |
играть | norm[Aspect]: Imp; norm[Mood]: Ind; norm[Number]: Sing; norm[Person]: 3; norm[Tense]: Past; norm[VerbForm]: Fin; norm[Voice]: Act; norm[mystem_gr]: V, tran 10; norm[mystem_lex]: играть; norm[lemma]: играть; norm[pos]: VERB | мальчик играл с мячом | |
как at the beginning of dependent/subordinate clause | norm[mystem_gr]: CONJ; norm[mystem_lex]: как; norm[lemma]: как; norm[pos]: SCONJ; all other grammemes on UPOS-layers get deleted | он не знает как это делается | |
как in case of comparison or emphasizing | norm[mystem_gr]: PART; norm[mystem_lex]: как; norm[lemma]: как; norm[pos]: PART; all other grammemes on UPOS-layers get deleted | водитель тупой как пробка | |
как at the beginning of direct questions or at the beginning of indirect questions in suboridinate clauses | norm[mystem_gr]: ADVPRO; norm[mystem_lex]: как; norm[lemma]: как; norm[pos]: PRON; all other grammemes on UPOS-layers get deleted | как у тебя дела; подскажите как пройти к библиотеке | |
как in function of a subordinate conjunction without a comparison meaning, but in form of an adverb | norm[mystem_gr]: ADVPRO; norm[mystem_lex]: как; norm[lemma]: как; norm[pos]: PRON; all other grammemes on UPOS-layers get deleted | мальчик показал как пройти к дому; я не знаю как это сделать | |
кажется as вводное слово | norm[Aspect]: Imp; norm[Mood]: Ind; norm[Number]: Sing; norm[Person]: 3; norm[Tense]: Pres; norm[VerbForm]: Fin; norm[Voice]: Act; norm[mystem_gr]: V, parenth, tran; norm[mystem_lex]: казаться; norm[lemma]: казаться; norm[pos]: VERB | кажется водитель не вовремя видел мячик | |
км/ч | norm[mystem_gr]: S, abbr; norm[mystem_lex]: км/ч; norm[lemma]: км/ч; norm[pos]: NOUN; all other grammemes on UPOS-layers get deleted | сто км/ч | |
какой | norm[Case]: Nom; norm[Gender]: Masc; norm[Number]: Sing; norm[mystem_gr]: APRO11; norm[mystem_lex]: какой; norm[lemma]: какой; norm[pos]: PRON; all other grammemes on UPOS-layers get deleted | там шёл какой-то мужик | |
короче as вводное слово | norm[Degree]: Cmp; norm[mystem_gr]: ADV, parenth; norm[mystem_lex]: коротко; norm[lemma]: коротко; norm[pos]: ADV; all other grammemes on UPOS-layers get deleted | короче там шла женщина с коляской | |
который | norm[Case]: Nom; norm[Gender]: Masc; norm[Number]: Sing; norm[mystem_gr]: APRO 12; norm[pos]: PRON | этот мальчик ну который там играл с мячиком он | |
мой, твой | norm[Case]: Gen; norm[Gender]: Masc; norm[Number]: Sing; norm[mystem_gr]: APRO; norm[mystem_lex]: мой; norm[lemma]: мой; norm[pos]: PRON | я звоню вам с моего телефона | |
мол as вводное слово | norm[mystem_gr]: PART, parenth; norm[mystem_lex]: мол; norm[lemma]: мол; norm[pos]: PART; all other grammemes on UPOS-layers get deleted | ---- | |
мужик | norm[Animacy]: Anim; norm[case]:Nom; norm[Gender]: Masc; norm[Number]: Sing; norm[mystem_gr]: S,inform; norm[pos]: NOUN; all other grammemes on UPOS-layers get deleted | мужик побежал на дорогу | |
наверно, похоже as вводное слово | norm[mystem_gr]: ADV, parenth; norm[mystem_lex]: наверно; norm[lemma]: наверно; norm[pos]: ADV; all other grammemes on UPOS-layers get deleted | он наверно этого не знал | |
никто | norm[Case]: Acc; norm[Gender]: Masc; norm[mystem_gr]: SPRO; norm[mystem_lex]: никто; norm[lemma]: никто; norm[pos] PRON; all other grammemes on UPOS-layers get deleted | я никого не видел | |
нет | norm[mystem_gr]: PART, parenth; norm[mystem_lex]: нет; norm[lemma]: нет; norm[pos]: PART; all other grammemes on UPOS-layers get deleted | нет не поеду ни за что | |
ну | norm[mystem_gr]: PART; norm[mystem_lex]: ну; norm[lemma]: ну; norm[pos]: PART; all other grammemes on UPOS-layers get deleted | ну что я могу сказать | |
нужно, можно, надо | norm[mystem_gr]: ADV, praed; norm[mystem_lex]: нужно; norm[lemma]: нужно; norm[pos]: ADV; all other grammemes on UPOS-layers get deleted | ----- | |
ого | norm[mystem_gr]: PART; norm[mystem_lex]: ого; norm[lemma]: ого; norm[pos]: PART; all other grammemes on UPOS-layers get deleted | ----- | |
один | norm[Case]: Nom; norm[Gender]: Masc; norm[Number]: Sing; norm[mystem_gr]: ANUM; norm[mystem_lex]: один; norm[lemma]: один; norm[pos]: NUM | я видел как один человек позвонил в полицию | |
окей | norm[mystem_gr]: PART; norm[mystem_lex]: окей; norm[lemma]: окей; norm[pos]: PART; all other grammemes on UPOS-layers get deleted | ----- | |
первый | norm[Case]: Nom; norm[Gender]: Fem; norm[Number]: Sing; norm[mystem_gr]: ANUM; norm[mystem_lex]: первый; norm[lemma]: первый; norm[pos]: NUM | первая машина свернула с дороги на парковку и резко остановилась | stehen lassen |
пока (conjunction) | norm[mystem_gr]: CONJ; norm[mystem_lex]: пока; norm[lemma]: пока; norm[pos]: SCONJ; all other grammemes on UPOS-layers get deleted | пока она доставала продукты из машины мальчик играл с мячом | |
пока (leave-taking) | norm[mystem_gr]: INTJ; norm[mystem_lex]: пока; norm[lemma]: пока; norm[pos]: INTJ; all other grammemes on UPOS-layers get deleted | пока пока | |
потом, затем | norm[mystem_gr]: ADVPRO; norm[mystem_lex]: потом; norm[lemma]: потом; norm[pos]: PRON; all other grammems on UPOS-laysers get delated | потом машины стукнулись | |
потому, поэтому | norm[mystem_gr]: ADVPRO; norm[mystem_lex]: потому; norm[lemma]: потому; norm[pos]: ADV; all other grammemes on UPOS-layers get deleted | потому что водитель был пьяный | |
раз | norm[Animacy]:Inan; norm[Case]: Nom; norm[Gender]: Masc; norm[Number]: Sing; norm[mystem_gr]: S,m,inan ; norm[pos]: NOUN; all other grammemes on UPOS-layers get deleted | которая как раз въехала | |
ранен | norm[Aspect]: Imp; norm[Gender]: Masc; norm[Number]: Sing; norm[Tense]: Past; norm[Variant]: Short; norm[VerbForm]: Part; norm[Voice]: Pass; norm[mystem_gr]: V, tran, praed; norm [mystem_lex]: ранить; norm[lemma]: ранить; norm[pos]: VERB; all other grammemes on UPOS-layers get deleted | никто не ранен | |
свой | norm[Case]: Acc; norm[Gender]: Masc; norm[Number]: Sing; norm[mystem_gr]: APRO 13; norm[pos]: PRON | он любит свой народ | |
сзади | norm[mystem_gr]: ADV; norm[mystem_lex]: сзади; norm[lemma]: сзади; norm[pos]: ADV; all other grammemes on UPOS-layers get deleted | а сзади как раз машина подъезжает | |
сзади | norm[mystem_gr]: PR; norm[mystem_lex]: сзади; norm[lemma]: сзади; norm[pos]: ADP; all other grammemes on UPOS-layers get deleted | а сзади неё как раз две машины подъезжают | |
собакин | norm[case]:Acc; norm[Number]:Plur; norm[mystem_gr]: APRO,poss; norm[mystem_lex]:собакин; norm[lemma]:собакин; norm[pos]:ADJ all other grammems on UPOS-laysers get delated | тётя и дядя я думаю это собакины | |
спасибо | norm[mystem_gr]: INTJ; norm[mystem_lex]: спасибо; norm[lemma]: спасибо; norm[pos]: INTJ; all other grammemes on UPOS-layers get deleted | ----- | |
судя | norm[Aspect]: Imp; norm[Tense]: Pres; norm[VerbForm]: Conv; norm[Voice]: Mid; norm[mystem_gr]: V, intr, ger; norm[mytem_lex]: судить; norm[lemma]: судить; norm[pos]: VERB; all other grammems on UPOS-laysers get delated | судя по тому что случилось | |
там, так, тут | norm[mystem_gr]: ADVPRO; norm[mystem_lex]: там; norm[lemma]: там; norm[pos]: ADV; all other grammems on UPOS-laysers get delated | там женщина шла по дороге | |
типа | norm[mystem_gr]: PART,parenth; norm[mystem_lex]: типа; norm[lemma]: типа; norm[pos]: PART; all other grammemes on UPOS-layers get deleted | ну типа того | |
то at the beginning of suboridinate clauses | norm[mystem_gr]: CONJ; norm[mystem_lex]: то; norm[lemma]: то; norm[pos]: SCONJ; all other grammemes on UPOS-layers get deleted | если у вас ещё вопросы возникнут то свяжитесь со мной | |
то in function to replace sth. | norm[Case]: Nom; norm[Gender]: Neut; norm[Number]: Sing; norm[mystem_gr]: APRO; norm[myste_lex]: тот; norm[lemma]: тот; norm[pos]: PRON; all other grammemes on UPOS-layers get deleted | произошло то что мы все предвидели | |
тоже, только | norm[mystem_gr]: PART; тnorm[mystem_lex]: тоже; norm[lemma]: тоже; norm[pos]: PART; all other grammemes on UPOS-layers get deleted | он тоже вышел из машины | |
тот, этот, такой | norm[Case]: Dat; norm[Gender]: Fem; norm[Number]: Sing; norm[mystem_gr]: APRO11; norm[mystem_lex]: тот; norm[lemma]: тот; norm[pos]: DET; all other grammemes on UPOS-layers get deleted | по той же дороге ехали ещё две машины | |
увидев | norm[Aspect]: Perf; norm[Tense]: Past; norm[VerForm]: Conv; norm[Voice]: Act; norm[mystem_gr]: V, tran, ger; norm[mystem_lex]: увидеть; norm[lemma]: увидеть; norm[pos]: VERB; all other grammemes on UPOS-layers get deleted | собака увидев мяч кинулась на него | |
ф | dipl[language]: rus; norm[mystem_gr]: S,persn; norm[mystem_lex]: ф; norm[lemma]: ф; norm[pos]: PROPN; all other grammemes on UPOS-layers get deleted | ф шестнадцать | |
хз (хер знает) | norm[mystem_gr]: INTJ, abbr, parenth; norm[mystem_lex]: хз; norm[lemma]: хз; norm[pos]: INTJ | Водители обсуждали ситуацию но полиции не было хз | |
чуть-чуть | norm[mystem_gr]: ADV; norm[mystem_lex]: чуть-чуть; norm[lemma]: чуть-чуть; norm[pos]: ADV; all other grammemes on UPOS-layers get deleted | он чуть-чуть опоздал | |
щас | norm[mystem_gr]: ADV,inform; norm[mystem_lex]: щас; norm[lemma]: щас; norm[pos]: ADV; all other grammemes on UPOS-layers get deleted | щас приду | |
это in function to replace sth. | norm[Case]: Nom; norm[Gender]: Neut; norm[Number]: Sing; norm[mystem_gr]: APRO; norm[myste_lex]: этот; norm[lemma]: этот; norm[pos]: PRON; all other grammemes on UPOS-layers get deleted | он ему это сказал | |
это after dash (тире) | norm[mystem_gr]: PART; norm[myste_lex]: это; norm[lemma]: это; norm[pos]: PART; all other grammemes on UPOS-layers get deleted | мама - это самый родной человек на свете | |
я | norm[case]:Nom; norm[Number]: Sing; norm[Person]: 1; norm[mystem_gr]: SPRO 14; norm[pos]: PRON; all other grammemes on UPOS-layers get deleted | ----- |
5. Comments
1 U-POS and MyStem use partly different features for the POS-tagging of words.
Example: In case of the Russian personal pronoun я U-POS dismisses it to be a pronoun (PRON). Further specifications in U-POS are not given in this context. In contrast to that, MyStem specifies the pronoun.
MyStem dismisses я to be a noun-pronoun (SPRON).
2 In general, all reflexive verbs in Russian can be identified by the verb postfix -ся. But not all verbs which end with the postfix -ся are reflexive verbs. Verbs with a transitive word stem and the postfix -ся are not reflexive verbs, but verbs in passive voice. When in doubt, check the Russian verb by translating it into German. If you can translate the Russian verb with sich... into German, then it is very likely a real reflexive verb and should be marked on norm[Reflex]-layer with Yes and on norm[Voice]-layer with Mid. If that is not possible and you have to translate the verb into German with the aid of the passive construction wird/werden...ge-..., then it is very likely a transitive verb in its passive form. In this case the word gets marked on norm[Voice]-layer with Pas and the norm[Reflex]-layer stays empty .
Example: Книга читается.
Das Buch liest sich. ==> This translation wouldn't make sense (except in fairy-tales), because a book can't
usually read itself.
Das Buch wird gelesen. ==> This translation is more logical than the translation above (if we imagine, that the
context is not a fairy-tale), because the word stem is a transitive verb with the
postfix ending -ся. Therefore, the verb expresses the passive and can be translated
here in that way, that the book gets read by someone, who is unknown or who doesn't
want to be mentioned.
Example: Человек развивается.
Der Mensch wird entwickelt. ==> Развивать is an transitive verb and the postfix -ся could lead to the
conclusion, that in this case we are dealing with the passive voice. Basically,
it is absolutely possible and without the context of course difficult to define.
In view of this, that we don't have a context, orient yourself on the
general meaning of this sentence, which is often used.
Der Mensch entwickelt sich. ==> This is the general meaning of this sentence, which is used quite often. In its
general meaning the verb doesn´t have a passive, instead a reflexive meaning.
This meaning can be preferred in such cases, in which the context doesn't exist
or is not very clear.
3 Transitive verbs are verbs, which govern direct objects (objects in accusative without preposition). Between the verb and the accusative object is no preposition. Only transitive verbs can create the passive voice. The passive voice can be recognized by a word stem of a transitive verb + postfix -ся.
Example: Мальчик читает книгу. Книга читается мальчиком.
Junge (Nom) liest (tran.verb) Buch (Acc.obj. wihtout preposition). Buch (Nom) wird gelesen (pass. voice of a
tran.verb) vom Jungen (Inst).
Intransitive verbs are verbs which govern indirect objects (objects in accusative with preposition or objects in other grammatical cases). Between verbs and object(s) can appear a preposition. The objects can appear in accusative with a preposition, in dative with or without a preposition, in genitive with or without a preposition, in instrumental with or without a preposition and in locative with preposition (objects in locative always stand with a preposition, therefore the Russian locative is called the preposition case). Intransitive verbs can't create the passive voice.
Example: Папа звонит маме. *Мама звонится папой.
Papa (Nom) ruft (intr.verb) an Mama (Dat.). *Mama wird angerufen von Papa.
4 Keep in mind, that not all kinds of adverbs and not all kinds of adjectives can form degrees. The adverb сегодня or the adjective другой can´t form degrees. In these cases you should delate the token on norm[degree]-layer.
5 In this case быть has the function of an auxiliary (Hilfsverb). Therefore, the main act/ main verb of the sentence does not posses быть, but уверен (in combination with быть). On this account the word быть gets defined on norm[pos]-layer as AUX.
6 In this case быть is the main act of the sentence and has therefore the function of the main verb (Vollverb). On this account the word быть gets defined on norm[mystem_lex]-layer and norm[pos]-layer as VERB.
7 The pronoun весь has these grammatical features, if it can be translated as ganz/целый. In these cases весь can be seen more as an adjective, therefore APRO and PRO.
15 The pronoun весь has these grammatical features, if it can be translated as all/aller. In these cases весь gets used to replace a noun or a phrase and to refer back to an element, word or situation, which was already introduced in the discourse before, but the speaker won´t repeat it again, therefore DET and SPRO.
8 In comparison to один, два is defined on norm[mystem_gr]-layer as NUM, because it doesn´t get inflected like an adjective. Therefore, один gets on norm[mystem_gr]-layer ANUM (because it has in inflection features like an adjective) and два gets NUM (because it hasn´t features like an adjective in inflection). Furthermore, in comparison to один два hasn´t a plural paradigma. 9 The word другой is defined on norm[mystem_gr]-layer as APRO, because it gets inflected like an adjective, but has the function of a SPRO to replace other nouns, therfore APRO and ADJ. Furthermore, другой can´t form degrees, therefore the event on norm[degree]-layer should be empty.
10 In this context the verb играть is intransitive, because the Russian preposition c usually requires the instrumental. However, there exist cases, in which играть can be used as a transitive verb.
Example: Вася играет дурака в этом спектакле.
Vasja (Nom) spielt (tran.verb) den Dummen (acc.object without a preposition between verb and object) in diesem
Stück (Loc).
Therefore, all verbs which might have a transitive meaning in other contexts have to be defined as transitive on MyStem layer, even if the verb is used as an intransitive verb in the current context! The reason is, that a verb, which can be used (theoretically) as a transitive verb, gets always treated as a verb with a transitive basic meaning, no matter if this transitive meaning of the verb appears in the current situation or not.
13 The pronoun свой is defined on norm[mystem_gr]-layer as APRO, because it gets inflected like an adjective, therefore APRO.
12 Words like такой or который are defined on norm[mystem_gr]-layer as APRO, because in Russian these pronouns get inflected like adjectives, therefore APRO.
16 то есть is seen as two seperated words, because there is no hyphen (дефис), which combines the two words to one word ==> то is a word for itself and есть is a word for itself. Therefore, each word is seen as an own token, gets an own event and has to be determined grammatically on its own. The same concerns words like потому что or только что. They are seen as two separated words, get own events and have to be grammatically determined on their own.
11 Words like тот or этот are defined on norm[mystem_gr]-layer as APRO, because these pronouns get inflected like adjectives, therefore APRO. These pronouns are defined on norm[pos]-layer as DET, because they have editionally an determinanting (referring) function, because these pronouns refer back to an element, word or situation, which was already introduced in the discourse before, but the speaker won´t repeat it again. Therefore the speaker uses determinating (referring) pronouns.
14 All personal pronouns are defined on norm[mystem_gr]-layer as SPRO and on norm[pos]-layer as PRON. Personal pronouns get defined on norm[mystem_gr]-layer as SPRO, because in Russian these pronouns replace other nouns (существительные), therefore SPRO.
6. Useful links
- If you have problems to decide to which part of speech the current word belongs, then look the word up in the
Национальный корпус русского языка and check their results or solution. But keep in mind that they have analyzed the speech of their participants partly under different conditions and assumptions. - All U-POS features are available here: Universal features part 1 and Universal features part 2
- All MyStem features are available here: MyStem features
- If you have problems to decide whether the current word is a transitive or an intransitive one or if you simply don't know in which grammatical case a word appears, use Викисловарь