1-17
Language Resources for a Question-Answering System for Romanian
Authors: Verginica Barbu Mititelu, Alexandru Ceauşu, Radu Ion, Elena Irimia, Dan Stefanescu, Dan Tufis
Number of views: 400
We describe here several language resources (a lexicon, a paradigmatic morphology, two linguistic thesauri – the Romanian wordnet and Eurovoc – and a parallel multilingual corpus) from the perspective of their utility especially in question-answering tasks. We present the stages of the automatic finding of an answer to a question written by a user in a natural language. Wherever necessary, we show the way in which the linguistic resources contribute to various problems solving. The lexicon is a sort of spellchecker for the user’s question. The paradigmatic morphology is used for lemmatizing the question and the corpus. The Romanian wordnet is useful for query expansion, for identifying the lexical chains between words senses and for answers retrieval in a mono- and a multilingual system. The Eurovoc thesaurus is used for segmentation and lemmatization of the user’s question and of the parallel multilingual corpus from which the answer is retrieved. The architecture of the question-answering system described here is language independent; the language resources, however, are, inherently, language dependent (e.g. the lexicon, the paradigmatic morphology); exceptions make those whose organization or structure allows for a multilingual perspective (e.g. the thesauri and the corpus); in our case, they are aligned.