Informace o publikaci

CRANBERRY: Memory-Effective Search in 100M High-Dimensional CLIP Vectors



Rok publikování 2023
Druh Článek ve sborníku
Konference 16th International Conference on Similarity Search and Applications (SISAP)
Fakulta / Pracoviště MU

Fakulta informatiky

Klíčová slova approximate similarity searching;high-dimensional data;indexing;filtering;LAION dataset
Popis Recent advances in cross-modal multimedia data analysis necessarily require efficient similarity search on the scales of hundreds of millions of high-dimensional vectors. We address this task by proposing the CRANBERRY algorithm that specifically combines and tunes several existing similarity search strategies. In particular, the algorithm: (1) employs the Voronoi partitioning to obtain a query-relevant candidate set in constant time, (2) applies filtering techniques to prune the obtained candidates significantly, and (3) re-rank the retained candidate vectors with respect to the query vector. Applied to the dataset of 100 million 768-dimensional vectors, the algorithm evaluates 10NN queries with 90% recall and query latency of 1.2s on average, all with a throughput of 15 queries per second on a server with 56 core-CPU, and 4.7 q/sec. on a PC.

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.

Další info