A novel proposal for outlier detection in high dimensional space

Zhana Bao, Wataru Kameyama

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    2 Citations (Scopus)

    Abstract

    Finding rare information behind big data is important and meaningful for outlier detection. However, to find such rare information is extremely difficult when the notorious curse of dimensionality exists in high dimensional space. Most of existing methods fail to obtain good result since the Euclidean distance cannot work well in high dimensional space. In this paper, we first perform a grid division of data for each attribute, and compare the density ratio for every point in each dimension. We then project the points of the same area to other dimensions, and then we calculate the disperse extent with defined cluster density value. At last, we sum up all weight values for each point in two-step calculations. After the process, outliers are those points scoring the largest weight. The experimental results show that the proposed algorithm can achieve high precision and recall on the synthetic datasets with the dimension varying from 100 to 10000.

    Original languageEnglish
    Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Pages307-318
    Number of pages12
    Volume7867 LNAI
    DOIs
    Publication statusPublished - 2013
    Event17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013 - Gold Coast, QLD
    Duration: 2013 Apr 142013 Apr 17

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume7867 LNAI
    ISSN (Print)03029743
    ISSN (Electronic)16113349

    Other

    Other17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013
    CityGold Coast, QLD
    Period13/4/1413/4/17

    Fingerprint

    Outlier Detection
    High-dimensional
    Curse of Dimensionality
    Euclidean Distance
    Scoring
    Outlier
    Division
    Attribute
    Grid
    Calculate
    Big data
    Experimental Results

    Keywords

    • Dimensional projection
    • High dimension
    • Outlier score

    ASJC Scopus subject areas

    • Computer Science(all)
    • Theoretical Computer Science

    Cite this

    Bao, Z., & Kameyama, W. (2013). A novel proposal for outlier detection in high dimensional space. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7867 LNAI, pp. 307-318). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7867 LNAI). https://doi.org/10.1007/978-3-642-40319-4_27

    A novel proposal for outlier detection in high dimensional space. / Bao, Zhana; Kameyama, Wataru.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7867 LNAI 2013. p. 307-318 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7867 LNAI).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Bao, Z & Kameyama, W 2013, A novel proposal for outlier detection in high dimensional space. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 7867 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7867 LNAI, pp. 307-318, 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013, Gold Coast, QLD, 13/4/14. https://doi.org/10.1007/978-3-642-40319-4_27
    Bao Z, Kameyama W. A novel proposal for outlier detection in high dimensional space. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7867 LNAI. 2013. p. 307-318. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-40319-4_27
    Bao, Zhana ; Kameyama, Wataru. / A novel proposal for outlier detection in high dimensional space. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7867 LNAI 2013. pp. 307-318 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    @inproceedings{dc7ddb596ac247df91ecdf1cb6b279d8,
    title = "A novel proposal for outlier detection in high dimensional space",
    abstract = "Finding rare information behind big data is important and meaningful for outlier detection. However, to find such rare information is extremely difficult when the notorious curse of dimensionality exists in high dimensional space. Most of existing methods fail to obtain good result since the Euclidean distance cannot work well in high dimensional space. In this paper, we first perform a grid division of data for each attribute, and compare the density ratio for every point in each dimension. We then project the points of the same area to other dimensions, and then we calculate the disperse extent with defined cluster density value. At last, we sum up all weight values for each point in two-step calculations. After the process, outliers are those points scoring the largest weight. The experimental results show that the proposed algorithm can achieve high precision and recall on the synthetic datasets with the dimension varying from 100 to 10000.",
    keywords = "Dimensional projection, High dimension, Outlier score",
    author = "Zhana Bao and Wataru Kameyama",
    year = "2013",
    doi = "10.1007/978-3-642-40319-4_27",
    language = "English",
    isbn = "9783642403187",
    volume = "7867 LNAI",
    series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
    pages = "307--318",
    booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

    }

    TY - GEN

    T1 - A novel proposal for outlier detection in high dimensional space

    AU - Bao, Zhana

    AU - Kameyama, Wataru

    PY - 2013

    Y1 - 2013

    N2 - Finding rare information behind big data is important and meaningful for outlier detection. However, to find such rare information is extremely difficult when the notorious curse of dimensionality exists in high dimensional space. Most of existing methods fail to obtain good result since the Euclidean distance cannot work well in high dimensional space. In this paper, we first perform a grid division of data for each attribute, and compare the density ratio for every point in each dimension. We then project the points of the same area to other dimensions, and then we calculate the disperse extent with defined cluster density value. At last, we sum up all weight values for each point in two-step calculations. After the process, outliers are those points scoring the largest weight. The experimental results show that the proposed algorithm can achieve high precision and recall on the synthetic datasets with the dimension varying from 100 to 10000.

    AB - Finding rare information behind big data is important and meaningful for outlier detection. However, to find such rare information is extremely difficult when the notorious curse of dimensionality exists in high dimensional space. Most of existing methods fail to obtain good result since the Euclidean distance cannot work well in high dimensional space. In this paper, we first perform a grid division of data for each attribute, and compare the density ratio for every point in each dimension. We then project the points of the same area to other dimensions, and then we calculate the disperse extent with defined cluster density value. At last, we sum up all weight values for each point in two-step calculations. After the process, outliers are those points scoring the largest weight. The experimental results show that the proposed algorithm can achieve high precision and recall on the synthetic datasets with the dimension varying from 100 to 10000.

    KW - Dimensional projection

    KW - High dimension

    KW - Outlier score

    UR - http://www.scopus.com/inward/record.url?scp=84892885580&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84892885580&partnerID=8YFLogxK

    U2 - 10.1007/978-3-642-40319-4_27

    DO - 10.1007/978-3-642-40319-4_27

    M3 - Conference contribution

    AN - SCOPUS:84892885580

    SN - 9783642403187

    VL - 7867 LNAI

    T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    SP - 307

    EP - 318

    BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    ER -