A Lexicographer-Friendly Association Score



Rok publikování 2008
Druh Článek ve sborníku
Konference RASLAN 2008
Fakulta informatiky

Obor Jazykověda
Klíčová slova corpus linguistics tools; grammatical relations in the Sketch Engine; the logDice score
Popis Finding collocation candidates is one of the most important and widely used feature of corpus linguistics tools. There are many statistical association measures used to identify good collocations. Most of these measures define a formula of a association score which indicates amount of statistical association between two words. The score is computed for all possible word pairs and the word pairs with the highest score are presented as collocation candidates. The same scores are used in many other algorithms in corpus linguistics. The score values are usually meaningless and corpus specific, they cannot be used to compare words (or word pairs) of different corpora. But endusers want an interpretation of such scores and want a score’s stability. This paper present a modification of a well known association score which has a reasonable interpretation and other good features.
