15-34
POS tagger based on second-order HMM
Authors: Dumitru-Clementin Cercel, Stefan Trausan-Matu
Number of views: 328
Part-of-speech tagging (POS tagging) is the process of grammatical labelling of each word in a sentence, phrase or paragraph with the corresponding part of speech. This process is a component of other modules of natural language processing and therefore the results should be as precise as possible. Once a part of speech has been identified, it provides supplementary information about the parts of speech that can appear in the same sentence. In the case of POS tagging, the ambiguities arise due to the fact that a word may have multiple morphological values depending on context. In this paper is performed, from an experimental perspective, an analysis of a POS Tagger based on a Second-Order Hidden Markov Model, using the Brown corpus. The tests have been conducted to obtain results according to various parameters. We will show how changes the accuracy of a POS tagger for English when become different, on the one hand, the training set size, and on the other hand, the domains of the original functions in comparison with the domain of the training set. We have identified the categories of texts from Brown corpus used for the training corpus when the accuracy of the POS tagger is higher, lower respectively.