k nearest neighbor similarity join on high-dimensional data has broad applications in many fields; several key challenges still exist for this task such as "curse of dimensionality"and large scale of the dataset. A new dimensionality reduction scheme is proposed by using random projection technique, then we design two novel partition strategies, including equal width partition strategy and distance split tree-based partition strategy, and finally, we propose k nearest neighbor join algorithm on high-dimensional data based on the above partition strategies. We conduct comprehensive experiments to test the performance of the proposed approaches, and the experimental results show that the proposed methods have good effectiveness and performance.
ASJC Scopus subject areas
- Information Systems
- Computer Networks and Communications