Generalized sequential pattern mining with item intervals

Yu Hirate, Hayato Yamana

    Research output: Contribution to journalArticle

    60 Citations (Scopus)

    Abstract

    Sequential pattern mining is an important data mining method with broad applications that can extract frequent sequences while maintaining their order. However, it is important to identify item intervals of sequential patterns extracted by sequential pattern mining. For example, a sequence < A;B > with a 1-day interval and a sequence < A;B > with a 1-year interval are completely different; the former sequence may have some association, while the latter may not. To adopt item intervals, two approaches have been proposed for integration of item intervals with sequential pattern mining; (1) constraint-based mining and (2) extended sequence-based mining. However, although constraint-based mining approach avoids the extraction of sequences with non-interest time intervals such as too long intervals it has setbacks in that it is difficult to specify optimal constraints related to item interval, and users must re-execute constraint-based algorithms with changing constraint values. On the other hand, extended sequence-based mining approach does not need to specify constraints and re-execute. Since extended sequence-based mining approach cannot adopt any constraints based on time intervals, it may extract meaningless patterns, such as sequences with too long item intervals. This means these two approaches have not only advantages but also disadvantages. To solve this problem, in this paper, we generalize sequential pattern mining with item interval. The generalization includes three points; (a) a capability to handle two kinds of item interval measurement, item gap and time interval, (b) a capability to handle extended sequences which are defined by inserting pseudo items based on the interval itemization function, and (c) adopting four item interval constraints. Generalized sequential pattern mining is able to substitute all types of conventional sequential pattern mining algorithms with item intervals. Using Japanese earthquake data, we have confirmed that our proposed algorithm is able to extract sequential patterns with item interval, defined in a flexible manner by the interval itemization function.

    Original languageEnglish
    Pages (from-to)51-60
    Number of pages10
    JournalJournal of Computers
    Volume1
    Issue number3
    Publication statusPublished - 2006

    Fingerprint

    Data mining
    Earthquakes

    Keywords

    • Data Mining
    • Gap
    • Item Intervals
    • Sequential Pattern Mining
    • Time-stamp

    ASJC Scopus subject areas

    • Computer Science(all)

    Cite this

    Generalized sequential pattern mining with item intervals. / Hirate, Yu; Yamana, Hayato.

    In: Journal of Computers, Vol. 1, No. 3, 2006, p. 51-60.

    Research output: Contribution to journalArticle

    @article{a16d8ea29b5d4d11b6f4f5d1c6f1c921,
    title = "Generalized sequential pattern mining with item intervals",
    abstract = "Sequential pattern mining is an important data mining method with broad applications that can extract frequent sequences while maintaining their order. However, it is important to identify item intervals of sequential patterns extracted by sequential pattern mining. For example, a sequence < A;B > with a 1-day interval and a sequence < A;B > with a 1-year interval are completely different; the former sequence may have some association, while the latter may not. To adopt item intervals, two approaches have been proposed for integration of item intervals with sequential pattern mining; (1) constraint-based mining and (2) extended sequence-based mining. However, although constraint-based mining approach avoids the extraction of sequences with non-interest time intervals such as too long intervals it has setbacks in that it is difficult to specify optimal constraints related to item interval, and users must re-execute constraint-based algorithms with changing constraint values. On the other hand, extended sequence-based mining approach does not need to specify constraints and re-execute. Since extended sequence-based mining approach cannot adopt any constraints based on time intervals, it may extract meaningless patterns, such as sequences with too long item intervals. This means these two approaches have not only advantages but also disadvantages. To solve this problem, in this paper, we generalize sequential pattern mining with item interval. The generalization includes three points; (a) a capability to handle two kinds of item interval measurement, item gap and time interval, (b) a capability to handle extended sequences which are defined by inserting pseudo items based on the interval itemization function, and (c) adopting four item interval constraints. Generalized sequential pattern mining is able to substitute all types of conventional sequential pattern mining algorithms with item intervals. Using Japanese earthquake data, we have confirmed that our proposed algorithm is able to extract sequential patterns with item interval, defined in a flexible manner by the interval itemization function.",
    keywords = "Data Mining, Gap, Item Intervals, Sequential Pattern Mining, Time-stamp",
    author = "Yu Hirate and Hayato Yamana",
    year = "2006",
    language = "English",
    volume = "1",
    pages = "51--60",
    journal = "Journal of Computers",
    issn = "1796-203X",
    publisher = "Academy Publisher",
    number = "3",

    }

    TY - JOUR

    T1 - Generalized sequential pattern mining with item intervals

    AU - Hirate, Yu

    AU - Yamana, Hayato

    PY - 2006

    Y1 - 2006

    N2 - Sequential pattern mining is an important data mining method with broad applications that can extract frequent sequences while maintaining their order. However, it is important to identify item intervals of sequential patterns extracted by sequential pattern mining. For example, a sequence < A;B > with a 1-day interval and a sequence < A;B > with a 1-year interval are completely different; the former sequence may have some association, while the latter may not. To adopt item intervals, two approaches have been proposed for integration of item intervals with sequential pattern mining; (1) constraint-based mining and (2) extended sequence-based mining. However, although constraint-based mining approach avoids the extraction of sequences with non-interest time intervals such as too long intervals it has setbacks in that it is difficult to specify optimal constraints related to item interval, and users must re-execute constraint-based algorithms with changing constraint values. On the other hand, extended sequence-based mining approach does not need to specify constraints and re-execute. Since extended sequence-based mining approach cannot adopt any constraints based on time intervals, it may extract meaningless patterns, such as sequences with too long item intervals. This means these two approaches have not only advantages but also disadvantages. To solve this problem, in this paper, we generalize sequential pattern mining with item interval. The generalization includes three points; (a) a capability to handle two kinds of item interval measurement, item gap and time interval, (b) a capability to handle extended sequences which are defined by inserting pseudo items based on the interval itemization function, and (c) adopting four item interval constraints. Generalized sequential pattern mining is able to substitute all types of conventional sequential pattern mining algorithms with item intervals. Using Japanese earthquake data, we have confirmed that our proposed algorithm is able to extract sequential patterns with item interval, defined in a flexible manner by the interval itemization function.

    AB - Sequential pattern mining is an important data mining method with broad applications that can extract frequent sequences while maintaining their order. However, it is important to identify item intervals of sequential patterns extracted by sequential pattern mining. For example, a sequence < A;B > with a 1-day interval and a sequence < A;B > with a 1-year interval are completely different; the former sequence may have some association, while the latter may not. To adopt item intervals, two approaches have been proposed for integration of item intervals with sequential pattern mining; (1) constraint-based mining and (2) extended sequence-based mining. However, although constraint-based mining approach avoids the extraction of sequences with non-interest time intervals such as too long intervals it has setbacks in that it is difficult to specify optimal constraints related to item interval, and users must re-execute constraint-based algorithms with changing constraint values. On the other hand, extended sequence-based mining approach does not need to specify constraints and re-execute. Since extended sequence-based mining approach cannot adopt any constraints based on time intervals, it may extract meaningless patterns, such as sequences with too long item intervals. This means these two approaches have not only advantages but also disadvantages. To solve this problem, in this paper, we generalize sequential pattern mining with item interval. The generalization includes three points; (a) a capability to handle two kinds of item interval measurement, item gap and time interval, (b) a capability to handle extended sequences which are defined by inserting pseudo items based on the interval itemization function, and (c) adopting four item interval constraints. Generalized sequential pattern mining is able to substitute all types of conventional sequential pattern mining algorithms with item intervals. Using Japanese earthquake data, we have confirmed that our proposed algorithm is able to extract sequential patterns with item interval, defined in a flexible manner by the interval itemization function.

    KW - Data Mining

    KW - Gap

    KW - Item Intervals

    KW - Sequential Pattern Mining

    KW - Time-stamp

    UR - http://www.scopus.com/inward/record.url?scp=57049152226&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=57049152226&partnerID=8YFLogxK

    M3 - Article

    VL - 1

    SP - 51

    EP - 60

    JO - Journal of Computers

    JF - Journal of Computers

    SN - 1796-203X

    IS - 3

    ER -