Publication details

 

Fast syntactic searching in very large corpora for many languages

Basic information
Original title:Fast syntactic searching in very large corpora for many languages
Authors:Miloš Jakubíček, Pavel Rychlý, Adam Kilgarriff, Diana McCarthy
Further information
Citation:JAKUBÍČEK, Miloš, Pavel RYCHLÝ, Adam KILGARRIFF a Diana MCCARTHY. Fast syntactic searching in very large corpora for many languages. In PACLIC 24 Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation. Tokyo: Waseda University, 2010. s. 741-747, 7 s. ISBN 978-4-905166-00-9.Export BibTeX
@inproceedings{908008,
author = {Jakubíček, Miloš and Rychlý, Pavel and Kilgarriff, Adam and McCarthy, Diana},
address = {Tokyo},
booktitle = {PACLIC 24 Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation},
keywords = {corpus search; large corpora; CQL; syntactic search},
language = {eng},
location = {Tokyo},
isbn = {978-4-905166-00-9},
pages = {741-747},
publisher = {Waseda University},
title = {Fast syntactic searching in very large corpora for many languages},
year = {2010}
}
Original language:English
Field:Informatics
Type:Article in Proceedings
Keywords:corpus search; large corpora; CQL; syntactic search

For many linguistic investigations, the first step is to find examples. In the 21st century, they should all be found, not invented. Thus linguists need flexible tools for finding even quite rare phenomena. To support linguists well, they need to be fast even where corpora are very large and queries are complex. We present extensions to the CQL ("Corpus Query Language") for intuitive creation of syntactically rich queries, and demonstrate that they can be computed quickly within our tool even on multi-billion word corpora.

Related projects: