Informace o publikaci

Tailored Fine-Tuning For The Comma Insertion In Czech

Autoři

MACHURA Jakub ŽIŽKOVÁ Hana STANO Patrik VRABCOVÁ Tereza HLAVÁČKOVÁ Dana TRNOVEC Ondřej

Rok publikování 2025
Druh Další prezentace na konferencích
Fakulta / Pracoviště MU

Filozofická fakulta

Citace
Popis Transfer learning techniques, particularly the use of pre-trained Transformers, can be trained on vast amounts of text in a particular language and can be tailored to specific grammar correction tasks, such as automatic punctuation correction. The Czech pre-trained RoBERTa model demonstrates outstanding performance in this task (Machura et al. 2022); however, previous attempts to improve the model have so far led to a slight degradation (Machura et al. 2023). In this paper, we present a more targeted fine-tuning of this model, addressing linguistic phenomena that the base model overlooked. Additionally, we provide a comparison with other models trained on a more diverse dataset beyond just web texts.
Související projekty:

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.

Další info