SASH: a Spatial Approximation Sample Hierarchy for Similarity Search

This paper introduces a practical metric index for approximate similarity queries of large multi-dimensional data sets: the spatial approximation sample hierarchy (SASH). The SASH requires a similarity function satisfying the conditions of a distance metric, but otherwise makes no assumptions regarding the nature of the data. The data set is organized into a multi-level structure of random samples, in which objects at a given level are connected to several approximate nearest neighbors drawn from the level immediately above. In contrast with the partition-based methods proposed to date, most query result objects are reachable via multiple paths through a relatively compact portion of the structure. Experimental results are provided showing that for approximate k-nearest-neighbor queries, the method consistently returns a high proportion of the true k-nearest neighbors at speeds of one or more orders of magnitude faster than sequential search. Importantly, the method also allows significantly better control over the time-accuracy trade-off than previous approximation methods.

By: Michael E. Houle

Published in: VLDB 2002, Hong Kong in 2002

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .