Publication details

Využití corpus driven metod při corpus based výzkumu

Title in English The Corpus-driven and Corpus-based Approach in Practice


Year of publication 2015
Type Article in Proceedings
Conference Proměna jazyka a jeho výzkumu v době nových médií a technologií
MU Faculty or unit

Faculty of Arts

Field Linguistics
Keywords corpus; corpus based; corpus driven; overgeneration; undegeneration; lemma; tag; word formation
Description Overgeneration is a property of formal rules which does not cover the exact language data it was designed for. It is equivalent to low precision and occurs when a formal rule (corpus query) is too widely defined. Undergeneration is equivalent to low recall and occurs when a formal rule (corpus query) is too narrowly specified. Both are caused by the ambiguity of natural language. In this article we shall demonstrate how to use corpus driven method in optimization of retrieval technique for corpus based analysis. On a specific example of retrieval of candidates for a word formation model (kutil) we shall show how to use observation of corpus data for progressive specification of corpus query.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info