Two phases outlier detection in different subspaces

Zhana Bao, Wataru Kameyama

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    1 Citation (Scopus)

    Abstract

    Mining high dimensional outliers is not fully resolved for its dimensional particularity. The existing full space based methods can find distinct outliers and neglect those hidden in some subspaces. Subspace based approaches can detect most outliers that are apparent in low dimensional spaces, while missing the invisible outliers in subspaces. This paper proposes a novel two-phase inspection model. The first phase measures neighbor's density in subspaces to find low dimensional outliers. The second phase evaluates deviation degree of neighbors in connected subspaces. The undiscovered outliers appear a fast dispersion and scatter more than its neighbors. We analysis two-phase results statistically, and merge into one score for each object. The outliers are expressed with top score objects. The evaluation on synthetic and real data sets shows that our proposal outperform state of the art algorithms in high dimensional outlier issue.

    Original languageEnglish
    Title of host publicationInternational Conference on Information and Knowledge Management, Proceedings
    PublisherAssociation for Computing Machinery
    Pages57-62
    Number of pages6
    Volume2014-November
    EditionNovember
    DOIs
    Publication statusPublished - 2014 Nov 3
    Event7th PhD Workshop in Information and Knowledge Management, PIKM 2014, in Conjunction with the ACM CIKM 2014 Conference - Shanghai, China
    Duration: 2014 Nov 3 → …

    Other

    Other7th PhD Workshop in Information and Knowledge Management, PIKM 2014, in Conjunction with the ACM CIKM 2014 Conference
    CountryChina
    CityShanghai
    Period14/11/3 → …

    Fingerprint

    Outlier detection
    Outliers
    Inspection
    Neglect
    Deviation
    Evaluation

    Keywords

    • Connected subspace
    • Dimensional projection
    • High dimension
    • Outlier score

    ASJC Scopus subject areas

    • Business, Management and Accounting(all)
    • Decision Sciences(all)

    Cite this

    Bao, Z., & Kameyama, W. (2014). Two phases outlier detection in different subspaces. In International Conference on Information and Knowledge Management, Proceedings (November ed., Vol. 2014-November, pp. 57-62). Association for Computing Machinery. https://doi.org/10.1145/2663714.2668046

    Two phases outlier detection in different subspaces. / Bao, Zhana; Kameyama, Wataru.

    International Conference on Information and Knowledge Management, Proceedings. Vol. 2014-November November. ed. Association for Computing Machinery, 2014. p. 57-62.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Bao, Z & Kameyama, W 2014, Two phases outlier detection in different subspaces. in International Conference on Information and Knowledge Management, Proceedings. November edn, vol. 2014-November, Association for Computing Machinery, pp. 57-62, 7th PhD Workshop in Information and Knowledge Management, PIKM 2014, in Conjunction with the ACM CIKM 2014 Conference, Shanghai, China, 14/11/3. https://doi.org/10.1145/2663714.2668046
    Bao Z, Kameyama W. Two phases outlier detection in different subspaces. In International Conference on Information and Knowledge Management, Proceedings. November ed. Vol. 2014-November. Association for Computing Machinery. 2014. p. 57-62 https://doi.org/10.1145/2663714.2668046
    Bao, Zhana ; Kameyama, Wataru. / Two phases outlier detection in different subspaces. International Conference on Information and Knowledge Management, Proceedings. Vol. 2014-November November. ed. Association for Computing Machinery, 2014. pp. 57-62
    @inproceedings{7a2008d80c094b9f9a333adcda549fe3,
    title = "Two phases outlier detection in different subspaces",
    abstract = "Mining high dimensional outliers is not fully resolved for its dimensional particularity. The existing full space based methods can find distinct outliers and neglect those hidden in some subspaces. Subspace based approaches can detect most outliers that are apparent in low dimensional spaces, while missing the invisible outliers in subspaces. This paper proposes a novel two-phase inspection model. The first phase measures neighbor's density in subspaces to find low dimensional outliers. The second phase evaluates deviation degree of neighbors in connected subspaces. The undiscovered outliers appear a fast dispersion and scatter more than its neighbors. We analysis two-phase results statistically, and merge into one score for each object. The outliers are expressed with top score objects. The evaluation on synthetic and real data sets shows that our proposal outperform state of the art algorithms in high dimensional outlier issue.",
    keywords = "Connected subspace, Dimensional projection, High dimension, Outlier score",
    author = "Zhana Bao and Wataru Kameyama",
    year = "2014",
    month = "11",
    day = "3",
    doi = "10.1145/2663714.2668046",
    language = "English",
    volume = "2014-November",
    pages = "57--62",
    booktitle = "International Conference on Information and Knowledge Management, Proceedings",
    publisher = "Association for Computing Machinery",
    edition = "November",

    }

    TY - GEN

    T1 - Two phases outlier detection in different subspaces

    AU - Bao, Zhana

    AU - Kameyama, Wataru

    PY - 2014/11/3

    Y1 - 2014/11/3

    N2 - Mining high dimensional outliers is not fully resolved for its dimensional particularity. The existing full space based methods can find distinct outliers and neglect those hidden in some subspaces. Subspace based approaches can detect most outliers that are apparent in low dimensional spaces, while missing the invisible outliers in subspaces. This paper proposes a novel two-phase inspection model. The first phase measures neighbor's density in subspaces to find low dimensional outliers. The second phase evaluates deviation degree of neighbors in connected subspaces. The undiscovered outliers appear a fast dispersion and scatter more than its neighbors. We analysis two-phase results statistically, and merge into one score for each object. The outliers are expressed with top score objects. The evaluation on synthetic and real data sets shows that our proposal outperform state of the art algorithms in high dimensional outlier issue.

    AB - Mining high dimensional outliers is not fully resolved for its dimensional particularity. The existing full space based methods can find distinct outliers and neglect those hidden in some subspaces. Subspace based approaches can detect most outliers that are apparent in low dimensional spaces, while missing the invisible outliers in subspaces. This paper proposes a novel two-phase inspection model. The first phase measures neighbor's density in subspaces to find low dimensional outliers. The second phase evaluates deviation degree of neighbors in connected subspaces. The undiscovered outliers appear a fast dispersion and scatter more than its neighbors. We analysis two-phase results statistically, and merge into one score for each object. The outliers are expressed with top score objects. The evaluation on synthetic and real data sets shows that our proposal outperform state of the art algorithms in high dimensional outlier issue.

    KW - Connected subspace

    KW - Dimensional projection

    KW - High dimension

    KW - Outlier score

    UR - http://www.scopus.com/inward/record.url?scp=84937704623&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84937704623&partnerID=8YFLogxK

    U2 - 10.1145/2663714.2668046

    DO - 10.1145/2663714.2668046

    M3 - Conference contribution

    AN - SCOPUS:84937704623

    VL - 2014-November

    SP - 57

    EP - 62

    BT - International Conference on Information and Knowledge Management, Proceedings

    PB - Association for Computing Machinery

    ER -