Legal Terms and Word Sketches - A Case Study

Rok publikování 2010
Druh Článek ve sborníku
Konference Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2010
Obor Jazykověda
Klíčová slova Character recognition; Natural language processing systems Czech Republic; Financial domains; Large corpora; Legal texts; Noun phrase; Relevant terms; Valencies
Popis In this paper we describe an approach to the semiautomatic identification of legal terms in Czech texts. Our general goal is to offer supplementary tools for building dictionary of Czech law terms. At first we used the VaDis partial parser for recognition of the complex nominal constructions in a legal text - the current version of the Penal Code of the Czech Republic. Headwords of the recognized structures are usually relevant legal terms. Then we employed the Sketch Engine to find Word Sketches of these relevant terms in a large corpus of the standard Czech Czes, because corpora of legal Czech texts are not available yet. In spite of the fact that we used common texts we obtained very good candidates for legal terms as a result. We also discuss relations between VerbaLex frames of the selected group of Czech verbs with financial meaning that occur in legal texts andWord Sketches found for some of these verbs. It appears that the combination of the valency frames andWord Sketches provides good candidates for the legal terms as well. The paper is conceived as a case study in which we describe collocational behaviour of the selected Czech noun phrases and also some verbs belonging to the financial domain.
