Mersin University Journal of Linguistics and Literature (Mersin Üniversitesi Dil ve Edebiyat Dergisi

Number 2, 2016

1-33
Formulaicity in Turkish: Evidence from the Turkish National Corpus

Authors: Selma Ayşe Özel, Yasin Bektaş & Hakan Yılmazer

Abstract

View in PDF

Number of views: 301

Formulaic sequences are the most frequently occurred forms in a language. Identification of formulaic sequences in language is useful for a wide range of areas including linguistics, second language learning, natural language processing, etc. To identify formulaic sequences in a language, the most preferred method is to use a corpus, which may be formed from written texts or tape-recorded conversations in the language, and count the frequencies of sequences in the corpus. Then, most frequently occurring sequences are examined to find formulas. Numerous studies have been made to identify formulas for several languages like English. There exists only few studies about formulaicity in Turkish and most of these studies focus on identifying formulas in the forms of multi word units. Turkish, however, is an agglutinating language having a rich and complex morphology, therefore formulaic sequences in affixation should be discovered. Only very limited studies about formulaicity in affixation of Turkish exist in the literature. In this study, we try to discover formulaic sequences in affixation of Turkish by counting frequent suffix n-grams in written and spoken Turkish by using the Turkish National Corpus, which is a balanced, large scale, and general-purpose corpus for contemporary Turkish. We list the most frequent suffix combinations not only for verbs but also for all lexical categories like noun, adjective, verb, and adverb for both written and spoken corpora from Turkish National Corpus, and discuss similarities and differences in affixation in written and spoken usage of Turkish. We observe that, we prefer shorter suffix sequences in spoken Turkish than in written Turkish, and as the length of the suffix n-grams increase, we use different formulaic sequences in written and spoken Turkish.

35-52
Formulaicity within Turkish Words

Authors: Philip Durrant

Abstract

View in PDF

Number of views: 335

One of the main insights to emerge from the last fifty years of corpus linguistics has been a greater understanding of the pervasiveness of formulaic language. Rather than exercising the full generative capacity of language, speakers and writers have been shown to rely to a great extent on conventional, pre-constructed phrases drawn from memory. Turkish presents a particularly interesting and challenging case because its agglutinative structure means that messages which are spread across several orthographic words in English are often expressed within a single word in Turkish. While it is possible that this difference in structure will mean that new types of formulaicity will emerge in Turkish, a good starting place may be to consider the extent to which types of formulaicity which are known to exist in English at the multi-word level exist in Turkish at the sub-word level. The research discussed here set out to examine this possibility, looking in particular at three types of formulaicity: collocations, lexical bundles and collostructions.

53 - 70
Verb Synthesis and Frequency

Authors: Mustafa Aksan, Devrim Alıcı & Umut Ufuk Demirhan

Abstract

View in PDF

Number of views: 341

Successive affixation in agglutinative languages derives complex structures. This study introduces frequency information of affix sequences in verbal domain from a corpus data. Recurent patterns of "morphgrams" formed by combinations of voice suffixes from non-finite template with other verbal inflections from finite template are extracted from the corpus. Starting with the simple two-morphgram patterns to the most complex nine-morphgrams, affix sequences are cited in the corpora. Samples indicate that Turkish do not derive monsterous words but rather limits the number of affixes that may be attacehed to a verb root or stem. Various statistical calculations also indicated the significance of grammatical patterns of affixes. The method and findings of the study have implications for morphological processing in agglutinative languages.

71-108
Colligational Patterns of Turkish Multi-Word Units

Authors: Yeşim Aksan, Ümit Mersinli & Serap Altunay

Abstract

View in PDF

Number of views: 330

In multi-word unit (MWU) extraction studies, most of the challenges for rich morphology languages like Turkish can be overcome by the study of how colligational filtering works in our minds, along with how statistical and collocational sorting affects the process. Based on the assumption that lexicalization of any given collocation as a MWU also requires compatibility to some lexical or morphosyntactic constraints, this study will present the morphosyntactic tendencies observed in colligational patterns of Turkish MWUs and discuss their implications on language-specific MWU filtering processes. The aim of the study is to discuss if in Turkish, associative strength is enough for a collocation to be lexicalized as a MWU or not. Another purpose of the study is to show some morphosyntactic and lexical constraints that may validate collocations to be lexical multi-word units in Turkish. The paper will also underscore the methodological perspectives of MWU identification valid for rich-morphology languages. To achieve these goals, we first extracted MWU candidates -trigrams- from a 10-million-word sub-corpus of Turkish National Corpus (TNC) by using Text-NSP (Banerjee & Pederson, 2011). After that, the 3-grams were annotated by using the NLP dictionary of TNC-tagger, and classified according to their colligational patterns and lexical categories of the MWU. Most frequently observed colligational patterns are argued to be morphosyntactic tendencies governing MWU lexicalization in Turkish. In this respect, the study aims to contribute to the understudied area of formulaic language in Turkish.