Do we need very large corpora?
|Year of publication||2011|
|Type||Article in Proceedings|
|MU Faculty or unit|
|Keywords||corpora, corpus tools|
|Description||In the paper we are dealing with building very large corpora from Web. First, we discuss motivation and needs for this kind of resources both for linguists, lexicographers, and NLP specialists. Second, we mention the techniques used for building large (more than billion tokens) corpora and present the results obtained at NLP Centre FI MU, i.e. both tools and corpora. Then we pay attention to the analysis of the consequences following from building large text data resources and the ways in which they are used in corpus linguistics and various NLP applications.|