Informace o publikaci

Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search

Autoři

BOYTSOV Leonid NOVÁK David MALKOV Yury NYBERG Eric

Druh Článek ve sborníku
Konference CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
Doi http://dx.doi.org/10.1145/2983323.2983815
Obor Informatika
Klíčová slova k-NN search; IBM Model 1; non-metric spaces; LSH
Popis Retrieval pipelines commonly rely on a term-based search to obtain candidate records, which are subsequently re-ranked. Some candidates are missed by this approach, e.g., due to a vocabulary mismatch. We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle term associations. While an exact brute-force k-NN search using this similarity function is slow, we demonstrate that an approximate algorithm can be nearly two orders of magnitude faster at the expense of only a small loss in accuracy. A retrieval pipeline using an approximate k-NN search can be more effective and efficient than the term-based pipeline. This opens up new possibilities for designing effective retrieval pipelines. Our software (including data-generating code) and derivative data based on the Stack Overflow collection is available online.(1)
Související projekty:

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.

Další info