Informace o publikaci

Similarity Search for an Extreme Application: Experience and Implementation

Autoři

MÍČ Vladimír RAČEK Tomáš KŘENEK Aleš ZEZULA Pavel

Rok publikování 2021
Druh Článek ve sborníku
Konference Similarity Search and Applications: 14th International Conference, SISAP 2021, Dortmund, Germany, September 29 - October 1, 2021, Proceedings
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
www https://link.springer.com/chapter/10.1007/978-3-030-89657-7_20
Doi http://dx.doi.org/10.1007/978-3-030-89657-7_20
Klíčová slova Similarity search in metric space;Efficiency;Distance distribution;Dimensionality curse;Extreme distance function
Popis Contemporary challenges for efficient similarity search include complex similarity functions, the curse of dimensionality, and large sizes of descriptive features of data objects. This article reports our experience with a database of protein chains which form (almost) metric space and demonstrate the following extreme properties. Evaluation of the pairwise similarity of protein chains can take even tens of minutes, and has a variance of six orders of magnitude. The minimisation of a number of similarity comparisons is thus crucial, so we propose a generic three stage search engine to solve it. We improve the median searching time 73 times in comparison with the search engine currently employed for the protein database in practice.
Související projekty:

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.

Další info