Increasing Quality of the Corpus of Frequency Dictionary of Contemporary Polish for Morphosyntactic Tagging of the Polish Language

keywords: Corpora preparation, part-of-speech tagging, natural language processing, machine learning
The paper is devoted to the issue of correction of the erroneous and ambiguous corpus of Frequency Dictionary of Contemporary Polish (FDCP) and its application to morphosyntactic tagging of the Polish language. Several stages of corpus transformation are presented and baseline part-of-speech tagging algorithms are evaluated, too.
mathematics subject classification 2000: 68T50, 68T05, 68T35
reference: Vol. 28, 2009, No. 3, pp. 319–338