Dynamic SAX parameter estimation for time series

Chaw Thet Zan, Hayato Yamana

    Research output: Contribution to journalArticle

    1 Citation (Scopus)

    Abstract

    Purpose - The paper aims to estimate the segment size and alphabet size of Symbolic Aggregate approXimation (SAX). In SAX, time series data are divided into a set of equal-sized segments. Each segment is represented by its mean value and mapped with an alphabet, where the number of adopted symbols is called alphabet size. Both parameters control data compression ratio and accuracy of time series mining tasks. Besides, optimal parameters selection highly depends on different application and data sets. In fact, these parameters are iteratively selected by analyzing entire data sets, which limits handling of the huge amount of time series and reduces the applicability of SAX. Design/methodology/approach - The segment size is estimated based on Shannon sampling theorem (autoSAXSD-S) and adaptive hierarchical segmentation (autoSAXSD-M). As for the alphabet size, it is focused on how mean values of all the segments are distributed. The small number of alphabet size is set for large distribution to easily distinguish the difference among segments. Findings - Experimental evaluation using University of California Riverside (UCR) data sets shows that the proposed schemes are able to select the parameters well with high classification accuracy and show comparable efficiency in comparison with state-of-the-art methods, SAX and auto-iSAX. Originality/value - The originality of this paper is the way to find out the optimal parameters of SAX using the proposed estimation schemes. The first parameter segment size is automatically estimated on two approaches and the second parameter alphabet size is estimated on the most frequent average (mean) value among segments.

    Original languageEnglish
    Pages (from-to)387-404
    Number of pages18
    JournalInternational Journal of Web Information Systems
    Volume13
    Issue number4
    DOIs
    Publication statusPublished - 2017 Jan 1

    Fingerprint

    Parameter estimation
    Time series
    Data compression ratio
    Sampling

    Keywords

    • Classification
    • Data representation
    • Symbolic aggregate approximation
    • Time series

    ASJC Scopus subject areas

    • Information Systems
    • Computer Networks and Communications

    Cite this

    Dynamic SAX parameter estimation for time series. / Zan, Chaw Thet; Yamana, Hayato.

    In: International Journal of Web Information Systems, Vol. 13, No. 4, 01.01.2017, p. 387-404.

    Research output: Contribution to journalArticle

    @article{50e91930ec484eb2aa410ff8b618c99a,
    title = "Dynamic SAX parameter estimation for time series",
    abstract = "Purpose - The paper aims to estimate the segment size and alphabet size of Symbolic Aggregate approXimation (SAX). In SAX, time series data are divided into a set of equal-sized segments. Each segment is represented by its mean value and mapped with an alphabet, where the number of adopted symbols is called alphabet size. Both parameters control data compression ratio and accuracy of time series mining tasks. Besides, optimal parameters selection highly depends on different application and data sets. In fact, these parameters are iteratively selected by analyzing entire data sets, which limits handling of the huge amount of time series and reduces the applicability of SAX. Design/methodology/approach - The segment size is estimated based on Shannon sampling theorem (autoSAXSD-S) and adaptive hierarchical segmentation (autoSAXSD-M). As for the alphabet size, it is focused on how mean values of all the segments are distributed. The small number of alphabet size is set for large distribution to easily distinguish the difference among segments. Findings - Experimental evaluation using University of California Riverside (UCR) data sets shows that the proposed schemes are able to select the parameters well with high classification accuracy and show comparable efficiency in comparison with state-of-the-art methods, SAX and auto-iSAX. Originality/value - The originality of this paper is the way to find out the optimal parameters of SAX using the proposed estimation schemes. The first parameter segment size is automatically estimated on two approaches and the second parameter alphabet size is estimated on the most frequent average (mean) value among segments.",
    keywords = "Classification, Data representation, Symbolic aggregate approximation, Time series",
    author = "Zan, {Chaw Thet} and Hayato Yamana",
    year = "2017",
    month = "1",
    day = "1",
    doi = "10.1108/IJWIS-04-2017-0035",
    language = "English",
    volume = "13",
    pages = "387--404",
    journal = "International Journal of Web Information Systems",
    issn = "1744-0084",
    publisher = "Emerald Group Publishing Ltd.",
    number = "4",

    }

    TY - JOUR

    T1 - Dynamic SAX parameter estimation for time series

    AU - Zan, Chaw Thet

    AU - Yamana, Hayato

    PY - 2017/1/1

    Y1 - 2017/1/1

    N2 - Purpose - The paper aims to estimate the segment size and alphabet size of Symbolic Aggregate approXimation (SAX). In SAX, time series data are divided into a set of equal-sized segments. Each segment is represented by its mean value and mapped with an alphabet, where the number of adopted symbols is called alphabet size. Both parameters control data compression ratio and accuracy of time series mining tasks. Besides, optimal parameters selection highly depends on different application and data sets. In fact, these parameters are iteratively selected by analyzing entire data sets, which limits handling of the huge amount of time series and reduces the applicability of SAX. Design/methodology/approach - The segment size is estimated based on Shannon sampling theorem (autoSAXSD-S) and adaptive hierarchical segmentation (autoSAXSD-M). As for the alphabet size, it is focused on how mean values of all the segments are distributed. The small number of alphabet size is set for large distribution to easily distinguish the difference among segments. Findings - Experimental evaluation using University of California Riverside (UCR) data sets shows that the proposed schemes are able to select the parameters well with high classification accuracy and show comparable efficiency in comparison with state-of-the-art methods, SAX and auto-iSAX. Originality/value - The originality of this paper is the way to find out the optimal parameters of SAX using the proposed estimation schemes. The first parameter segment size is automatically estimated on two approaches and the second parameter alphabet size is estimated on the most frequent average (mean) value among segments.

    AB - Purpose - The paper aims to estimate the segment size and alphabet size of Symbolic Aggregate approXimation (SAX). In SAX, time series data are divided into a set of equal-sized segments. Each segment is represented by its mean value and mapped with an alphabet, where the number of adopted symbols is called alphabet size. Both parameters control data compression ratio and accuracy of time series mining tasks. Besides, optimal parameters selection highly depends on different application and data sets. In fact, these parameters are iteratively selected by analyzing entire data sets, which limits handling of the huge amount of time series and reduces the applicability of SAX. Design/methodology/approach - The segment size is estimated based on Shannon sampling theorem (autoSAXSD-S) and adaptive hierarchical segmentation (autoSAXSD-M). As for the alphabet size, it is focused on how mean values of all the segments are distributed. The small number of alphabet size is set for large distribution to easily distinguish the difference among segments. Findings - Experimental evaluation using University of California Riverside (UCR) data sets shows that the proposed schemes are able to select the parameters well with high classification accuracy and show comparable efficiency in comparison with state-of-the-art methods, SAX and auto-iSAX. Originality/value - The originality of this paper is the way to find out the optimal parameters of SAX using the proposed estimation schemes. The first parameter segment size is automatically estimated on two approaches and the second parameter alphabet size is estimated on the most frequent average (mean) value among segments.

    KW - Classification

    KW - Data representation

    KW - Symbolic aggregate approximation

    KW - Time series

    UR - http://www.scopus.com/inward/record.url?scp=85034841496&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85034841496&partnerID=8YFLogxK

    U2 - 10.1108/IJWIS-04-2017-0035

    DO - 10.1108/IJWIS-04-2017-0035

    M3 - Article

    AN - SCOPUS:85034841496

    VL - 13

    SP - 387

    EP - 404

    JO - International Journal of Web Information Systems

    JF - International Journal of Web Information Systems

    SN - 1744-0084

    IS - 4

    ER -