Informace o publikaci

Vocabulary Size of Czech Native Speakers: A Statistical Approach

Autoři

BLAHUŠ Marek JAKUBÍČEK Miloš KOVÁŘ Vojtěch KOVAŘÍK František

Rok publikování 2025
Druh Článek ve sborníku
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
www Electronic lexicography in the 21st century (eLex 2025): Intelligent lexicography. Proceedings of the eLex 2025 conference
Klíčová slova vocabulary size; native speaker; manual annotation; semi-automatic dictionary drafting; Dictionary Express
Přiložené soubory
Popis This paper explores the theory of measuring vocabulary size, including the various methods that can be used and the parameters that have to be set. We have examined the experiments carried out on English and Dutch. Goulden et al. (1990) claims the average native speaker knows about 17,000 English base words (non-derived words). Keuleers et al. (2015) and Brysbaert et al. (2016) claim the average native speaker with secondary education knows about 42,000 headwords (lemmas). We have conducted an experiment similar to that of Keuleers and Brysbaert on Czech, with the input of 100,000 letter sequences from the wordlists of large web corpora. We assume the vocabulary size of Czech native speakers (as well as the vocabulary size of native speakers of any language) could be bigger, exceeding 57,000 (Czech) headwords, should we provide the participants with more inputs (150,000 sequences, or even more) or should we count the specialized terminology of their fields of interest.

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.

Další info