Informace o publikaci

Database and Corpora Creation within RapCor Project for Czech

Autoři

NĚMCOVÁ POLICKÁ Alena RYCHLÝ Pavel

Rok publikování 2025
Druh Článek ve sborníku
Konference Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2025
Fakulta / Pracoviště MU

Filozofická fakulta

Citace
www Web page of the volume
Klíčová slova database; corpora; hip hop; RapCor; Czech
Přiložené soubory
Popis This paper introduces the motivations and first results of the creation of Czech RapCor project, mainly the constitution process of Czech RapCor Boosted v1 (Czech RCB), a specialized corpus of Czech rap lyrics designed for sociolinguistic and NLP research. The corpus highlights distinctive linguistic features, such as written colloquialism, frequent use of vulgarisms, and non-standard forms, which pose challenges for traditional NLP tools. Preliminary results demonstrate the corpus’s potential for studying authentic spoken language in written form, offering insights into rap culture and sociolinguistic phenomena.

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.

Další info