Informace o publikaci

European Union Language Resources in Sketch Engine

Autoři

BAISA Vít MICHELFEIT Jan MEDVEĎ Marek JAKUBÍČEK Miloš

Rok publikování 2016
Druh Článek ve sborníku
Konference Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
www http://www.lrec-conf.org/proceedings/lrec2016/pdf/572_Paper.pdf
Obor Informatika
Klíčová slova JRC-Acquis; DCEP; DGT-TM; Europarl; EUR-Lex; Sketch Engine; parallel corpus; word sketch; parallel concordance
Popis Several parallel corpora built from European Union language resources are presented here. They were processed by state-of-the-art tools and made available for researchers in the Sketch Engine corpus management system. A completely new resource is introduced: EUR-Lex corpus, being one of the largest parallel corpus available at the moment, containing 840 million tokens of English and having the largest language pair (English-French) with more than 25 million aligned segments (paragraphs).
Související projekty: