Russian POS and Lemma

0. General information

Lemmatization

  • the term lemma may be defined as the base form of a word
  • the base form of a word is the form, you can usually find in a dictionary
  • for verbs the base form correlates with the infinitive, for nouns with the nominative, and for adjectives with the nominative in its masculine form
  • the conversion of a word into its base form is called lemmatization
  • the lemmatization is carried out semi-automatically in the application program EXMARaLDA using two POS- and lemma-taggers U-POS and MyStem, however the accuracy of the taggers should be checked manually each time
  • the lemmas or base forms of the words can be found in MyStem on the norm[mystem_lex] and in U-POS on the norm[lemma] layer

POS-Tagging

  • the term tagging means that each word of the participant is attributed with its part of speech (POS)
  • the tagging is carried out in the application program EXMARaLDA by semi-automatic U-POS and MyStem taggers, but the accuracy of the taggers should be checked manually each time
  • there are two taggers in EXMARaLDA, which assume the task of POS-tagging - U-POS and MyStem
  • keep in mind, that these two tagging-softwares are similar to each other, but not absolutely identical 1

1. Structure of POS-Tagging in EXMARaLDA

U-POS-Layers

  • to the U-POS-software belong the layers from norm[Animacy] to norm[voice] as well as the norm[lemma] and the norm[pos] layer
  • each layer in U-POS (and MyStem) correlates with a grammatical category
  • the meaning of each grammatical category in U-POS gets explained in the following table:
Layer Grammatical category Grammeme Part of speech
norm[Animacy] Одушевлённость Одушевлённость (Anim);
Неодушевлённость (Inan)
concerns only nouns
norm[Aspect] Вид Cовершенный вид
[что сделать?] (Perf);
Несовершенный вид
[что делать?] (Imp)
concerns only verbs
norm[Case] Падеж им.п. (Nom);
род.п. (Gen);
дат.п. (Dat);
вин.п. (Acc);
твор.п. (Ins);
предл.п. (Loc);
зват.п. (Voc)
concerns all nominal categories of POS
norm[Degree] Степень сравнения положительная (Pos);
сравнительная (Cmp);
превосходная (Sup)
concerns adjectives and adverbs
norm[Foreign] иностранное слово (Yes) concerns all words, which do not belong to the Russian language
norm[Gender] Род муж.р. (Masc);
жен.р. (Fem);
сред.р. (Neut)
concerns only nouns, adjectives and pronouns
norm[Mood] Наклонение изъяв.н. (Ind);
услов.н. (Cnd);
повел.н. (Imp)
concerns only verbs
norm[Number] Число Единственное (Sing);
Множественное (Plur)
concerns nouns, adjectives, personal pronouns and verbs
norm[Person] Лицо Первое лицо (1);
Второе лицо (2);
Третье лицо (3)
concerns personal pronouns and verbs
norm[Tense] Время Настоящее (Pres);
Прошедшее (Past);
Будущее (Fut)
concerns verbs and participles
norm[VerbForm] Форма глагола Неопределённая форма глагола (Inf);
Финитная форма глагола (Fin);
Причастие (Part);
Деепричастие/Герундий (conv)
concerns verbs
norm[voice] Залог Действительный (Act);
middle voice (Mid);
Страдательный (Pas)
concerns verbs and participles
norm [lemma] Base form of a word
(Начальная форма слова)
------ concerns all parts of speech
norm[pos] POS-Determination of the given word according to UPOS principles существительное (NOUN);
глагол (VERB);
прилагательное (ADJ);
determiner (DET) [abandon in all cases] ...
concerns all parts of speech
norm[Reflex] Real reflexive verbs
(настоящие возвратные глаголы) 2
(Yes) concers verbs and participles

MyStem-Layers

  • to the MyStem-tagger belong the norm[mystem_gr] and the norm[mystem_lex] layers
  • each layer in MyStem (and U-POS) correlates with a grammatical category
  • the meaning of each grammatical category in MyStem can be explained as in the following table:
Layer Grammatical category Grammeme Part of speech
norm[mystem_gr] POS-Determination of the given word according to MyStem principles Every redundant grammeme on this layer gets deleted, except the first grammeme and - if they appear - the grammeme of transitivity (tran/intr) 3 and parenthesis (parenth) concerns all parts of speech
norm[mystem_lex] Base form of a word should conform with the base form in U-POS concerns all parts of speech

2. The subjects of lemmatization and POS-Tagging are ...

  • ... files from DEbi---R; USbi---R and RUmo---R with following symbols at the end:
    • _fsR (formal spoken Russian)
    • _fwR (formal written Russian)
    • _isR (informal spoken Russian)
    • _iwR (informal written Russian)

3. Steps of procedure

  • 1. step: Push/Pull/Fetch in GitHub
  • 2. step: Open EXMARaLDA Partitur-Editor
  • 3. step: File ==> Open ==> rueg repository ==> GitHub (or SmartGit) ==> rueg-corpus ==> exb ==> P3 ==> 1, 2, 3 …
  • 4. step: Verify if the CUs in every file correlate with the CU-guidelines - if not, please correct it
  • 5. step: Verify if every word correlates with its right language on the dipl[language]-layer - if not, please correct it
  • 6. step: POS-Tagging ==> verify the accuracy of the POS-Tagging-softwares (U-POS and MyStem)
  • 7. step: Delete all features from the norm[mystem_gr]-layer except the first one and - if available - the features of transitivity, parenthesis and other features which are not redundant with U-POS features
  • 8. step: Save your results
  • 9. step: Go to GitHub (SmartGit) ==> submit your file ==> push/pull/fetch -> commit

4. Tagging-Guidelines and problems

Phenomenon/Problem Solution Example
participant code dipl[language]: rus;
norm[Foreign]: Yes;
norm[mystem_gr]: S, persn;
norm[mystem_lex]: USbi05FR;
norm[lemma]: USbi05FR;
norm[pos]: PROPN;
all other grammemes on UPOS-layers get deleted
здравствуйте меня зовут USbi05FR
emojis dipl[language]: rus;
norm[pos]: SYM;
all other grammemes on UPOS-layers get deleted
-----
foreign words, e.g. english words: examine each grammatically e.g. анд dipl[language]: eng;
norm [Foreign]: Yes;
norm[mystem_gr]: CONJ;
norm[pos]: CCONJ;
norm[mystem_lex]:анд;
norm[lemma]:анд
and = анд
items, e.g. English items: examine each grammatically e.g. а(н) dipl[language]: eng;
norm [Foreign]: Yes;
norm[mystem_gr]: ANUM;
norm[mystem_lex]:а(н);
norm[lemma]:а(н);
norm[pos]: DET [abandon in all cases]
a(n) = а(н)
слова с буквой ё ё пишется на всех уровнях, кроме на уровне dipl ==> на уровне dipl ничего не изменяется ==> norm[norm]: …ё…;
norm[lemma]: …ё…;
norm[mystem_lex]: …ё
-----
ага norm[mystem_gr]: PART;
norm[mystem_lex]: ага;
norm[lemma]: ага;
norm[pos]: PART;
all other grammemes on UPOS-layers get deleted
-----
блин norm[mystem_gr]: INTJ;
norm[mystem_lex]: блин;
norm[lemma]: блин;
norm[pos]: INTJ;
all other grammemes on UPOS-layers get deleted
ну блин
быстро norm[Degree] Pos 4;
norm[mystem_gr]: ADV;
norm[mystem_lex]: быстро;
norm[lemma]: быстро;
norm[pos]: ADV;
all other grammemes on UPOS-layers get deleted
эта машина очень быстро ехала
быть norm[Aspect]: Imp;
norm[Gender]: Fem,
norm[Mood]: Ind;
norm[Number]: Sing;
norm[Tense]: Past;
norm[VerbForm]: Fin;
norm[Voice]: Act;
norm[mystem_gr]: V,intr;
norm[mystem_lex]: быть;
norm[lemma]: быть;
norm[pos]: AUX 5
она была уверена
быть norm[Aspect]: Imp;
norm[Gender]: Fem;
norm[Mood]: Ind;
norm[Number]: Sing;
norm[Tense]: Past;
norm[VerbForm]: Fin;
norm[Voice]: Act;
norm[mystem_gr]: V,intr;
norm[mystem_lex]: быть;
norm[lemma]: быть;
norm[pos]: VERB 6
там была собака
весь norm[Case]: Gen;
norm[Gender]: Fem;
norm[Number]: Sing;
norm[mystem_gr]: APRO 7;
norm[mystem_lex]: весь;
norm[lemma]: весь;
norm[pos]: PRON
от всей души; что скажешь к всему этому
вообще norm[mystem_gr]: ADV,parenth;
norm[mystem_lex]: вообще;
norm[lemma]: вообще;
norm[pos]: ADV;
all other grammemes on UPOS-layers get deleted
ну вообще там была ещё одна машина
вот in function to replace something norm[mystem_gr]: ADVPRO;
norm[mystem_lex]: вот;
norm[lemma]: вот;
norm[pos]: ADV;
all other grammemes on UPOS-layers get deleted
вот он идёт
вот in function of a modal particle norm[mystem_gr]: PART;
norm[mystem_lex]: вот;
norm[lemma]: вот;
norm[pos]: PART;
all other grammemes on UPOS-layers get deleted
вот а потом мальчик побежал за мячом
врезаться norm[Aspect]: Perf;
norm[Gender]: Fem;
norm[Mood]:Ind;
norm[Number]: Sing;
norm[Tense]: Past;
nomr[VerForm]: Fin;
norm[Voice]: Mid;
norm[mystem_gr]: V, intr;
norm[mystem_lex]: врезаться;
norm[lemma]: врезаться;
norm[pos]: VERB;
norm[Reflex]: Yes;
all other grammemes on UPOS-layers get deleted
одна машина врезалась в другую
вроде norm[mystem_gr]: PART;
norm[mystem_lex]: вроде;
norm[lemma]: вроде;
norm[pos]: PART;
all other grammemes on UPOS-layers get deleted
вроде никто не пострадал
всё (ещё, равно) norm[Case]: Nom;
norm[Gender]: Neut;
norm[Number]: Sing;
norm[mystem_gr]: APRO;
norm[mystem_lex]: всё;
norm[lemma]: всё;
norm[pos]: PRON
это всё; всё равно; всё ещё
всё-таки norm[mystem_gr]: PART;
norm[mystem_lex]: всё-таки;
norm[lemma]: всё-таки;
norm[pos]: PART;
all other grammemes on UPOS-layers get deleted
он всё-таки поступил по-своему
всё-таки after conjuctions и, а, но norm[mystem_gr]: CONJ;
norm[mystem_lex]: всё-таки;
norm[lemma]: всё-таки;
norm[pos]: SCONJ;
all other grammemes on UPOS-layers get deleted
как ни крути, а всё-таки придётся решить эту проблему
да norm[mystem_gr]: PART, parenth;
norm[mystem_lex]: да;
norm[lemma]: да;
norm[pos]: PART;
all other grammemes on UPOS-layers get deleted
да так всё произошло
давай norm[Aspect]: Imp;
norm[Mood]:Imp;
norm[Number]: Sing;
norm[Person]: 2;
nomr[VerForm]: Fin;
norm[Voice]: Act;
norm[mystem_gr]: V,tran;
norm[mystem_lex]: давать;
norm[lemma]: давать;
norm[pos]: VERB;
all other grammemes on UPOS-layers get deleted
давай
два norm[Case]: Nom;
norm[Gender]: Fem;
norm[mystem_gr]: NUM 8;
norm[mystem_lex]: два;
norm[lemma]: два;
norm[pos]: NUM
стукнулись две машины
должен, должна, должно, должны norm[Gender]: Masc;
norm[Number]: Sing;
norm[Variant]: Short;
norm[mystem_gr]: A, praed;
norm [mystem_lex]: должен;
norm[lemma]: должен;
norm[pos]: ADJ;
all other grammemes on UPOS-layers get deleted
он должен был позвонить в полицию, но в конце не звонил
другой norm[Case]: Acc;
norm[Gender]: Fem;
norm[Number]: Sing;
norm[mystem_gr]: APRO 9;
norm[mystem_lex]: другой;
norm[lemma]: другой;
norm[pos]: ADJ
одна машина врезалась в другую
ДТП (дорожно-транспортное происшествие) norm[Animacy]: Inan;
norm[Case]: Gen;
norm[Gender]: Neut (because of происшествие);
norm[Number]: Sing;
norm[mystem_gr]: S,abbr;
norm[mystem_lex]: ДТП;
norm[lemma]: ДТП;
norm[pos]: PROPN
я стал свиделем ДТП
его, её, их as possessive pronouns norm[case]: Gen;
norm[Gender]: Fem;
norm[number]: Sing;
norm[Person]:3;
norm[mystem_gr]: SPRO;
norm[mystem_lex]: она;
norm[lemma]: она;
norm[pos]: PRON
он уронил её пакет
ехавший norm[Aspect]: Imp;
norm[Case]: Nom;
norm[Gender]: Masc;
norm[Number]: Sing;
norm[Tense]: Past;
norm[VerbForm]: Part;
norm[Voice]: Act;
norm[mystem_gr]: V, intr;
norm[mytem_lex]: ехать;
norm[pos]: VERB;
all other grammems on UPOS-laysers get delated
второй водитель ехавший сзади не успел притормозить
ещё norm[mystem_gr]: ADV;
norm[mystem_lex]: ещё;
norm[lemma]: ещё;
norm[pos]: ADV;
all other grammemes on UPOS-layers get deleted
там ещё стояла женщина рядом с машиной
женат norm[Gender]: Masc;
norm[Number]: Sing;
norm[Variant]: Short;
norm[mystem_gr]: A, praed;
norm[mystem_lex]: женатый;
norm[lemma]: женатый;
norm[pos]: ADJ;
all other grammemes on UPOS-layers get deleted
он видимо женат
заезжаяnorm[Aspect]:Imp;
norm[Tense]:Pres;
norm[VerbForm]:Conv;
norm [Voice]: Act;
norm[mystem_gr]:V,intr,ger;
norm[mystem_lex]: заезжать;
norm[lemma]:заезжать;
norm[pos]:VERB;
all other grammemes on UPOS-layers get deleted
одновременно заезжая пара машин
здравствуйте, пока, привет norm[mystem_gr]: INTJ;
norm[mystem_lex]: здравствуйте;
norm[lemma]: здравствуйте;
norm[pos]: INTJ;
all other grammemes on UPOS-layers get deleted
здравствуйте я звоню по поводу
здрасте, приветик norm[mystem_gr]: INTJ, inform;
norm[mystem_lex]: здрасте;
norm[lemma]: здрасте;
norm[pos]: INTJ;
all other grammemes on UPOS-layers get deleted
здрасте я звоню по поводу
значит as вводное слово norm[Aspect]: Imp;
norm[Mood]: Ind;
norm[Number]: Sing;
norm[Person]: 3;
norm[Tense]: Pres;
norm[VerbForm]: Fin;
norm[Voice]: Act;
norm[mystem_gr]: V, parenth, tran;
norm[mystem_lex]: значить;
norm[lemma]: значить;
norm[pos]: VERB ;
all other grammemes on UPOS-layers get deleted
значит он уронил всё и пошёл
играть norm[Aspect]: Imp;
norm[Mood]: Ind;
norm[Number]: Sing;
norm[Person]: 3;
norm[Tense]: Past;
norm[VerbForm]: Fin;
norm[Voice]: Act;
norm[mystem_gr]: V, tran 10;
norm[mystem_lex]: играть;
norm[lemma]: играть;
norm[pos]: VERB
мальчик играл с мячом
как at the beginning of dependent/subordinate clause norm[mystem_gr]: CONJ;
norm[mystem_lex]: как;
norm[lemma]: как;
norm[pos]: SCONJ;
all other grammemes on UPOS-layers get deleted
он не знает как это делается
как in case of comparison or emphasizing norm[mystem_gr]: PART;
norm[mystem_lex]: как;
norm[lemma]: как;
norm[pos]: PART;
all other grammemes on UPOS-layers get deleted
водитель тупой как пробка
как at the beginning of direct questions or at the beginning of indirect questions in suboridinate clauses norm[mystem_gr]: ADVPRO;
norm[mystem_lex]: как;
norm[lemma]: как;
norm[pos]: PRON;
all other grammemes on UPOS-layers get deleted
как у тебя дела; подскажите как пройти к библиотеке
как in function of a subordinate conjunction without a comparison meaning, but in form of an adverb norm[mystem_gr]: ADVPRO;
norm[mystem_lex]: как;
norm[lemma]: как;
norm[pos]: PRON;
all other grammemes on UPOS-layers get deleted
мальчик показал как пройти к дому; я не знаю как это сделать
кажется as вводное слово norm[Aspect]: Imp;
norm[Mood]: Ind;
norm[Number]: Sing;
norm[Person]: 3;
norm[Tense]: Pres;
norm[VerbForm]: Fin;
norm[Voice]: Act;
norm[mystem_gr]: V, parenth, tran;
norm[mystem_lex]: казаться;
norm[lemma]: казаться;
norm[pos]: VERB
кажется водитель не вовремя видел мячик
км/ч norm[mystem_gr]: S, abbr;
norm[mystem_lex]: км/ч;
norm[lemma]: км/ч;
norm[pos]: NOUN;
all other grammemes on UPOS-layers get deleted
сто км/ч
какой norm[Case]: Nom;
norm[Gender]: Masc;
norm[Number]: Sing;
norm[mystem_gr]: APRO11;
norm[mystem_lex]: какой;
norm[lemma]: какой;
norm[pos]: PRON;
all other grammemes on UPOS-layers get deleted
там шёл какой-то мужик
короче as вводное слово norm[Degree]: Cmp;
norm[mystem_gr]: ADV, parenth;
norm[mystem_lex]: коротко;
norm[lemma]: коротко;
norm[pos]: ADV;
all other grammemes on UPOS-layers get deleted
короче там шла женщина с коляской
который norm[Case]: Nom;
norm[Gender]: Masc;
norm[Number]: Sing;
norm[mystem_gr]: APRO 12;
norm[pos]: PRON
этот мальчик ну который там играл с мячиком он
мой, твой norm[Case]: Gen;
norm[Gender]: Masc;
norm[Number]: Sing;
norm[mystem_gr]: APRO;
norm[mystem_lex]: мой;
norm[lemma]: мой;
norm[pos]: PRON
я звоню вам с моего телефона
мол as вводное слово norm[mystem_gr]: PART, parenth;
norm[mystem_lex]: мол;
norm[lemma]: мол;
norm[pos]: PART;
all other grammemes on UPOS-layers get deleted
----
мужик norm[Animacy]: Anim;
norm[case]:Nom;
norm[Gender]: Masc;
norm[Number]: Sing;
norm[mystem_gr]: S,inform;
norm[pos]: NOUN;
all other grammemes on UPOS-layers get deleted
мужик побежал на дорогу
наверно, похоже as вводное слово norm[mystem_gr]: ADV, parenth;
norm[mystem_lex]: наверно;
norm[lemma]: наверно;
norm[pos]: ADV;
all other grammemes on UPOS-layers get deleted
он наверно этого не знал
никто norm[Case]: Acc;
norm[Gender]: Masc;
norm[mystem_gr]: SPRO;
norm[mystem_lex]: никто;
norm[lemma]: никто;
norm[pos] PRON;
all other grammemes on UPOS-layers get deleted
я никого не видел
нет norm[mystem_gr]: PART, parenth;
norm[mystem_lex]: нет;
norm[lemma]: нет;
norm[pos]: PART;
all other grammemes on UPOS-layers get deleted
нет не поеду ни за что
ну norm[mystem_gr]: PART;
norm[mystem_lex]: ну;
norm[lemma]: ну;
norm[pos]: PART;
all other grammemes on UPOS-layers get deleted
ну что я могу сказать
нужно, можно, надо norm[mystem_gr]: ADV, praed;
norm[mystem_lex]: нужно;
norm[lemma]: нужно;
norm[pos]: ADV;
all other grammemes on UPOS-layers get deleted
-----
ого norm[mystem_gr]: PART;
norm[mystem_lex]: ого;
norm[lemma]: ого;
norm[pos]: PART;
all other grammemes on UPOS-layers get deleted
-----
один norm[Case]: Nom;
norm[Gender]: Masc;
norm[Number]: Sing;
norm[mystem_gr]: ANUM;
norm[mystem_lex]: один;
norm[lemma]: один;
norm[pos]: NUM
я видел как один человек позвонил в полицию
окей norm[mystem_gr]: PART;
norm[mystem_lex]: окей;
norm[lemma]: окей;
norm[pos]: PART;
all other grammemes on UPOS-layers get deleted
-----
первый norm[Case]: Nom;
norm[Gender]: Fem;
norm[Number]: Sing;
norm[mystem_gr]: ANUM;
norm[mystem_lex]: первый;
norm[lemma]: первый;
norm[pos]: NUM
первая машина свернула с дороги на парковку и резко остановилась stehen lassen
пока (conjunction) norm[mystem_gr]: CONJ;
norm[mystem_lex]: пока;
norm[lemma]: пока;
norm[pos]: SCONJ;
all other grammemes on UPOS-layers get deleted
пока она доставала продукты из машины мальчик играл с мячом
пока (leave-taking) norm[mystem_gr]: INTJ;
norm[mystem_lex]: пока;
norm[lemma]: пока;
norm[pos]: INTJ;
all other grammemes on UPOS-layers get deleted
пока пока
потом, затем norm[mystem_gr]: ADVPRO;
norm[mystem_lex]: потом;
norm[lemma]: потом;
norm[pos]: PRON;
all other grammems on UPOS-laysers get delated
потом машины стукнулись
потому, поэтому norm[mystem_gr]: ADVPRO;
norm[mystem_lex]: потому;
norm[lemma]: потому;
norm[pos]: ADV;
all other grammemes on UPOS-layers get deleted
потому что водитель был пьяный
раз norm[Animacy]:Inan;
norm[Case]: Nom;
norm[Gender]: Masc;
norm[Number]: Sing;
norm[mystem_gr]: S,m,inan ;
norm[pos]: NOUN;
all other grammemes on UPOS-layers get deleted
которая как раз въехала
ранен norm[Aspect]: Imp;
norm[Gender]: Masc;
norm[Number]: Sing;
norm[Tense]: Past;
norm[Variant]: Short;
norm[VerbForm]: Part;
norm[Voice]: Pass;
norm[mystem_gr]: V, tran, praed;
norm [mystem_lex]: ранить;
norm[lemma]: ранить;
norm[pos]: VERB;
all other grammemes on UPOS-layers get deleted
никто не ранен
свой norm[Case]: Acc;
norm[Gender]: Masc;
norm[Number]: Sing;
norm[mystem_gr]: APRO 13;
norm[pos]: PRON
он любит свой народ
сзади norm[mystem_gr]: ADV;
norm[mystem_lex]: сзади;
norm[lemma]: сзади;
norm[pos]: ADV;
all other grammemes on UPOS-layers get deleted
а сзади как раз машина подъезжает
сзади norm[mystem_gr]: PR;
norm[mystem_lex]: сзади;
norm[lemma]: сзади;
norm[pos]: ADP;
all other grammemes on UPOS-layers get deleted
а сзади неё как раз две машины подъезжают
собакинnorm[case]:Acc;
norm[Number]:Plur;
norm[mystem_gr]: APRO,poss;
norm[mystem_lex]:собакин;
norm[lemma]:собакин;
norm[pos]:ADJ
all other grammems on UPOS-laysers get delated
тётя и дядя я думаю это собакины
спасибо norm[mystem_gr]: INTJ;
norm[mystem_lex]: спасибо;
norm[lemma]: спасибо;
norm[pos]: INTJ;
all other grammemes on UPOS-layers get deleted
-----
судя norm[Aspect]: Imp;
norm[Tense]: Pres;
norm[VerbForm]: Conv;
norm[Voice]: Mid;
norm[mystem_gr]: V, intr, ger;
norm[mytem_lex]: судить;
norm[lemma]: судить;
norm[pos]: VERB;
all other grammems on UPOS-laysers get delated
судя по тому что случилось
там, так, тут norm[mystem_gr]: ADVPRO;
norm[mystem_lex]: там;
norm[lemma]: там;
norm[pos]: ADV;
all other grammems on UPOS-laysers get delated
там женщина шла по дороге
типа norm[mystem_gr]: PART,parenth;
norm[mystem_lex]: типа;
norm[lemma]: типа;
norm[pos]: PART;
all other grammemes on UPOS-layers get deleted
ну типа того
то at the beginning of suboridinate clauses norm[mystem_gr]: CONJ;
norm[mystem_lex]: то;
norm[lemma]: то;
norm[pos]: SCONJ;
all other grammemes on UPOS-layers get deleted
если у вас ещё вопросы возникнут то свяжитесь со мной
то in function to replace sth. norm[Case]: Nom;
norm[Gender]: Neut;
norm[Number]: Sing;
norm[mystem_gr]: APRO;
norm[myste_lex]: тот;
norm[lemma]: тот;
norm[pos]: PRON;
all other grammemes on UPOS-layers get deleted
произошло то что мы все предвидели
тоже, только norm[mystem_gr]: PART;
тnorm[mystem_lex]: тоже;
norm[lemma]: тоже;
norm[pos]: PART;
all other grammemes on UPOS-layers get deleted
он тоже вышел из машины
тот, этот, такой norm[Case]: Dat;
norm[Gender]: Fem;
norm[Number]: Sing;
norm[mystem_gr]: APRO11;
norm[mystem_lex]: тот;
norm[lemma]: тот;
norm[pos]: DET;
all other grammemes on UPOS-layers get deleted
по той же дороге ехали ещё две машины
увидев norm[Aspect]: Perf;
norm[Tense]: Past;
norm[VerForm]: Conv;
norm[Voice]: Act;
norm[mystem_gr]: V, tran, ger;
norm[mystem_lex]: увидеть;
norm[lemma]: увидеть;
norm[pos]: VERB;
all other grammemes on UPOS-layers get deleted
собака увидев мяч кинулась на него
ф dipl[language]: rus;
norm[mystem_gr]: S,persn;
norm[mystem_lex]: ф;
norm[lemma]: ф;
norm[pos]: PROPN;
all other grammemes on UPOS-layers get deleted
ф шестнадцать
хз (хер знает) norm[mystem_gr]: INTJ, abbr, parenth;
norm[mystem_lex]: хз;
norm[lemma]: хз;
norm[pos]: INTJ
Водители обсуждали ситуацию но полиции не было хз
чуть-чуть norm[mystem_gr]: ADV;
norm[mystem_lex]: чуть-чуть;
norm[lemma]: чуть-чуть;
norm[pos]: ADV;
all other grammemes on UPOS-layers get deleted
он чуть-чуть опоздал
щас norm[mystem_gr]: ADV,inform;
norm[mystem_lex]: щас;
norm[lemma]: щас;
norm[pos]: ADV;
all other grammemes on UPOS-layers get deleted
щас приду
это in function to replace sth. norm[Case]: Nom;
norm[Gender]: Neut;
norm[Number]: Sing;
norm[mystem_gr]: APRO;
norm[myste_lex]: этот;
norm[lemma]: этот;
norm[pos]: PRON;
all other grammemes on UPOS-layers get deleted
он ему это сказал
это after dash (тире) norm[mystem_gr]: PART;
norm[myste_lex]: это;
norm[lemma]: это;
norm[pos]: PART;
all other grammemes on UPOS-layers get deleted
мама - это самый родной человек на свете
я norm[case]:Nom;
norm[Number]: Sing;
norm[Person]: 1;
norm[mystem_gr]: SPRO 14;
norm[pos]: PRON;
all other grammemes on UPOS-layers get deleted
-----

5. Comments

1 U-POS and MyStem use partly different features for the POS-tagging of words.

 Example: In case of the Russian personal pronoun я U-POS dismisses it to be a pronoun (PRON). Further specifications in U-POS are not given in this context. In contrast to that, MyStem specifies the pronoun. 
          MyStem dismisses я to be a noun-pronoun (SPRON).  

2 In general, all reflexive verbs in Russian can be identified by the verb postfix -ся. But not all verbs which end with the postfix -ся are reflexive verbs. Verbs with a transitive word stem and the postfix -ся are not reflexive verbs, but verbs in passive voice. When in doubt, check the Russian verb by translating it into German. If you can translate the Russian verb with sich... into German, then it is very likely a real reflexive verb and should be marked on norm[Reflex]-layer with Yes and on norm[Voice]-layer with Mid. If that is not possible and you have to translate the verb into German with the aid of the passive construction wird/werden...ge-..., then it is very likely a transitive verb in its passive form. In this case the word gets marked on norm[Voice]-layer with Pas and the norm[Reflex]-layer stays empty .

 Example: Книга читается.
          Das Buch liest sich. ==> This translation wouldn't make sense (except in fairy-tales), because a book can't 
                                   usually read itself. 
          Das Buch wird gelesen. ==> This translation is more logical than the translation above (if we imagine, that the 
                                     context is not a fairy-tale), because the word stem is a transitive verb with the 
                                     postfix ending -ся. Therefore, the verb expresses the passive and can be translated 
                                     here in that way, that the book gets read by someone, who is unknown or who doesn't 
                                     want to be mentioned. 

 Example: Человек развивается.
          Der Mensch wird entwickelt. ==> Развивать is an transitive verb and the postfix -ся could lead to the 
                                          conclusion, that in this case we are dealing with the passive voice. Basically, 
                                          it is absolutely possible and without the context of course difficult to define. 
                                          In view of this, that we don't have a context, orient yourself on the 
                                          general meaning of this sentence, which is often used. 
          Der Mensch entwickelt sich. ==> This is the general meaning of this sentence, which is used quite often. In its 
                                          general meaning the verb doesn´t have a passive, instead a reflexive meaning. 
                                          This meaning can be preferred in such cases, in which the context doesn't exist 
                                          or is not very clear.  

3 Transitive verbs are verbs, which govern direct objects (objects in accusative without preposition). Between the verb and the accusative object is no preposition. Only transitive verbs can create the passive voice. The passive voice can be recognized by a word stem of a transitive verb + postfix -ся.

 Example: Мальчик  читает книгу. Книга читается  мальчиком. 
          Junge (Nom) liest (tran.verb) Buch (Acc.obj. wihtout preposition). Buch (Nom) wird gelesen (pass. voice of a 
          tran.verb) vom Jungen (Inst). 

Intransitive verbs are verbs which govern indirect objects (objects in accusative with preposition or objects in other grammatical cases). Between verbs and object(s) can appear a preposition. The objects can appear in accusative with a preposition, in dative with or without a preposition, in genitive with or without a preposition, in instrumental with or without a preposition and in locative with preposition (objects in locative always stand with a preposition, therefore the Russian locative is called the preposition case). Intransitive verbs can't create the passive voice.

 Example: Папа звонит маме. *Мама звонится папой.
          Papa (Nom) ruft (intr.verb) an Mama (Dat.). *Mama wird angerufen von Papa.

4 Keep in mind, that not all kinds of adverbs and not all kinds of adjectives can form degrees. The adverb сегодня or the adjective другой can´t form degrees. In these cases you should delate the token on norm[degree]-layer.

5 In this case быть has the function of an auxiliary (Hilfsverb). Therefore, the main act/ main verb of the sentence does not posses быть, but уверен (in combination with быть). On this account the word быть gets defined on norm[pos]-layer as AUX.

6 In this case быть is the main act of the sentence and has therefore the function of the main verb (Vollverb). On this account the word быть gets defined on norm[mystem_lex]-layer and norm[pos]-layer as VERB.

7 The pronoun весь has these grammatical features, if it can be translated as ganz/целый. In these cases весь can be seen more as an adjective, therefore APRO and PRO.

15 The pronoun весь has these grammatical features, if it can be translated as all/aller. In these cases весь gets used to replace a noun or a phrase and to refer back to an element, word or situation, which was already introduced in the discourse before, but the speaker won´t repeat it again, therefore DET and SPRO.

8 In comparison to один, два is defined on norm[mystem_gr]-layer as NUM, because it doesn´t get inflected like an adjective. Therefore, один gets on norm[mystem_gr]-layer ANUM (because it has in inflection features like an adjective) and два gets NUM (because it hasn´t features like an adjective in inflection). Furthermore, in comparison to один два hasn´t a plural paradigma. 9 The word другой is defined on norm[mystem_gr]-layer as APRO, because it gets inflected like an adjective, but has the function of a SPRO to replace other nouns, therfore APRO and ADJ. Furthermore, другой can´t form degrees, therefore the event on norm[degree]-layer should be empty.

10 In this context the verb играть is intransitive, because the Russian preposition c usually requires the instrumental. However, there exist cases, in which играть can be used as a transitive verb.

 Example: Вася играет дурака в этом спектакле.
          Vasja (Nom) spielt (tran.verb) den Dummen (acc.object without a preposition between verb and object)  in diesem 
          Stück (Loc). 

Therefore, all verbs which might have a transitive meaning in other contexts have to be defined as transitive on MyStem layer, even if the verb is used as an intransitive verb in the current context! The reason is, that a verb, which can be used (theoretically) as a transitive verb, gets always treated as a verb with a transitive basic meaning, no matter if this transitive meaning of the verb appears in the current situation or not.

13 The pronoun свой is defined on norm[mystem_gr]-layer as APRO, because it gets inflected like an adjective, therefore APRO.

12 Words like такой or который are defined on norm[mystem_gr]-layer as APRO, because in Russian these pronouns get inflected like adjectives, therefore APRO.

16 то есть is seen as two seperated words, because there is no hyphen (дефис), which combines the two words to one word ==> то is a word for itself and есть is a word for itself. Therefore, each word is seen as an own token, gets an own event and has to be determined grammatically on its own. The same concerns words like потому что or только что. They are seen as two separated words, get own events and have to be grammatically determined on their own.

11 Words like тот or этот are defined on norm[mystem_gr]-layer as APRO, because these pronouns get inflected like adjectives, therefore APRO. These pronouns are defined on norm[pos]-layer as DET, because they have editionally an determinanting (referring) function, because these pronouns refer back to an element, word or situation, which was already introduced in the discourse before, but the speaker won´t repeat it again. Therefore the speaker uses determinating (referring) pronouns.

14 All personal pronouns are defined on norm[mystem_gr]-layer as SPRO and on norm[pos]-layer as PRON. Personal pronouns get defined on norm[mystem_gr]-layer as SPRO, because in Russian these pronouns replace other nouns (существительные), therefore SPRO.

6. Useful links