Application of Weighted Voting Taggers to Languages Described with Large Tagsets

keywords: Part-of-speech tagging, combination tagger, weighted probability distribution voting tagger, TagPair tagger
The paper presents baseline and complex part-of-speech taggers applied to the modified corpus of Frequency Dictionary of Contemporary Polish, annotated with a large tagset. First, the paper examines accuracy of 6 baseline part-of-speech taggers. The main part of the work presents simple weighted voting and complex voting taggers. Special attention is paid to lexical voting methods and issues of ties and fallbacks. TagPair and WPDV voting methods achieve the top accuracy among all considered methods. Error reduction 10.8 % with respect to the best baseline tagger for the large tagset is comparable with other author's results for small tagsets.
mathematics subject classification 2000: 68T50, 68T05, 68T35
reference: Vol. 29, 2010, No. 2, pp. 203–225