English POS and Lemma
BNC: Tag List
- Might be less compatible with American English spellings
- Greater number of tags for accuracy; the tags are highly specific, though not all are necessary for our purposes (i.e., four categories for punctuation). Regardless, researchers searching for broader categories in the corpus should be able to do so by filtering the data appropriately.
- Intuitive tag names
- Multiple codes for determiners
Decisions
- Hi/Hello/Hey : ITJ (Interjection)
- F16: NP0 (proper noun)
- I : PNP
- am -> be: VBB
- like: ITJ (interjection)
- okay (ie. 'he is okay'): AJ0
- kind (of): AV0
- e (det): AT0
- same: AJ0
- as: CJS
- (in) front: PRP
- behind: PRP
- Police: NN0
- 911: NP0 (proper noun)
- no (AT0) one (PNI)
- as (PRP) well (AV0)
- "ish" should be removed during lemmatization (i.e. "smallish" --> "small")