Informace o publikaci

DSL Shared task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model

Autoři

HERMAN Ondřej SUCHOMEL Vít BAISA Vít RYCHLÝ Pavel

Rok publikování 2016
Druh Článek ve sborníku
Konference Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
www https://aclanthology.info/pdf/W/W16/W16-4815.pdf
Obor Informatika
Klíčová slova language discrimination;expectation maximization;language model
Popis In this paper we investigate two approaches to discrimination of similar languages: Expectation--maximization algorithm for estimating conditional probability P(word|language) and byte level language models similar to compression-based language modelling methods. The accuracy of these methods reached respectively 86.6 % and 88.3 % on set A of the DSL Shared task 2016 competition.
Související projekty: