A Lexicographer-Friendly Association Score
|Year of publication
|Article in Proceedings
|MU Faculty or unit
|corpus linguistics tools; grammatical relations in the Sketch Engine; the logDice score
|Finding collocation candidates is one of the most important and widely used feature of corpus linguistics tools. There are many statistical association measures used to identify good collocations. Most of these measures define a formula of a association score which indicates amount of statistical association between two words. The score is computed for all possible word pairs and the word pairs with the highest score are presented as collocation candidates. The same scores are used in many other algorithms in corpus linguistics. The score values are usually meaningless and corpus specific, they cannot be used to compare words (or word pairs) of different corpora. But endusers want an interpretation of such scores and want a score’s stability. This paper present a modification of a well known association score which has a reasonable interpretation and other good features.