What is the Nearest Neighbor in High Dimensional Space

Nearest neighbor search in high dimensional spaces is an interesting and important problem
which is relevant for a wide variety of novel database applications. As recent results show,
however, the problem is a very difficult one, not only with regards to the performance issue but
also to the quality issue. In this paper, we discuss the quality issue and identify a new generalized
notion of nearest neighbor search as the relevant problem in high dimensional space. In contrast
to previous approaches, our new notion of nearest neighbor search ,does not treat all dimensions
equally but uses a quality criterion to select relevant dimensions (projections) with respect to
the given query. As an example for a useful quality criterion, we rate how well the data is
clustered around the query point within the selected projection. We then propose an efficient
and effective algorithm to solve the generalized nearest neighbor problem. Our experiments
based on a number of real and synthetic data sets show that our new approach provides new
insights into the nature of nearest neighbor search on high dimensional data.

By: Alexander Hinneburg(University of Halle), Daneil A. Keim (University of Halle), Charu Aggarwal

Published in: RC21740 in 2000

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RC21740.pdf

Questions about this service can be mailed to reports@us.ibm.com .