A novel proposal for outlier detection in high dimensional space

Zhana Bao, Wataru Kameyama

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Finding rare information behind big data is important and meaningful for outlier detection. However, to find such rare information is extremely difficult when the notorious curse of dimensionality exists in high dimensional space. Most of existing methods fail to obtain good result since the Euclidean distance cannot work well in high dimensional space. In this paper, we first perform a grid division of data for each attribute, and compare the density ratio for every point in each dimension. We then project the points of the same area to other dimensions, and then we calculate the disperse extent with defined cluster density value. At last, we sum up all weight values for each point in two-step calculations. After the process, outliers are those points scoring the largest weight. The experimental results show that the proposed algorithm can achieve high precision and recall on the synthetic datasets with the dimension varying from 100 to 10000.

Original languageEnglish
Title of host publicationTrends and Applications in Knowledge Discovery and Data Mining - PAKDD 2013 International Workshops
Subtitle of host publicationDMApps, DANTH, QIMIE, BDM, CDA, CloudSD, Revised Selected Papers
Pages307-318
Number of pages12
DOIs
Publication statusPublished - 2013 Dec 1
Event17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013 - Gold Coast, QLD, Australia
Duration: 2013 Apr 142013 Apr 17

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7867 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013
CountryAustralia
CityGold Coast, QLD
Period13/4/1413/4/17

Keywords

  • Dimensional projection
  • High dimension
  • Outlier score

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'A novel proposal for outlier detection in high dimensional space'. Together they form a unique fingerprint.

  • Cite this

    Bao, Z., & Kameyama, W. (2013). A novel proposal for outlier detection in high dimensional space. In Trends and Applications in Knowledge Discovery and Data Mining - PAKDD 2013 International Workshops: DMApps, DANTH, QIMIE, BDM, CDA, CloudSD, Revised Selected Papers (pp. 307-318). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7867 LNAI). https://doi.org/10.1007/978-3-642-40319-4_27