Publication details

Extracting Phrases from PDT 2.0



Year of publication 2011
Type Article in Proceedings
Conference Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2011
MU Faculty or unit

Faculty of Informatics

Field Informatics
Keywords PDT; corpus; treebank; export; format; complex annotation; phrase; clause
Description The Prague Dependency Treebank (henceforth PDT) is a large collection of texts in Czech. It is renown for its respectable size and rich multi-layer annotation covering a wide range of complex phenomena. One the other hand, it can be argued that the complexity of the dataset may be a notable hindrance to using certain aspects of the data in a straightforward way. To overcome these problems, we present an export filter converting PDT into a more transparent data format, containing information about the most common phrase types. We believe that availability of the PDT data in this form will help encourage people unfamiliar with the underlying theory to use the corpus.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info