169-192
Recognizing named entities, quotes and events in news and social media items in Romanian
Authors: Adrian-Nicolae Zamfirescu, Traian Eugen Rebedea
Number of views: 370
At the border of natural language processing and information retrieval, named entity recognition has represented one of the most important research problems of the two domains, that has not been solved perfectly yet even for English texts. Furthermore, named entity recognition has opened up the path of solving other problems that use these linguistic contructs, such as the identification of quotes and declarations made by persons, in general, but also by companies or other types of organizations, or the extraction of events from texts. The problem of named entity identification and clasification has appeared from the necessity of being able to report the appearances of names of persons, organizations and other types of named entities relevant for various domains within written documents. In this article we shall present a solution for solving these three aforementioned problems for texts written in Romanian from various sources, like news items, blog articles or comments from social newtorks. The paper starts with a short overview of the theoretical underpinnings used for solving these problems, then we will present the methods actually used for the designed solution for Romanian. It combines machine learning algorithms with heuristics based on text patterns and regular expressions. At the end, we shall highlight the accuracy of the various methods used for solving the tasks, together with a comparison between the results obtained by each method.