Learning local languages and their application to DNA sequence analysis

Takashi Yokomori, Satoshi Kobayashi

    Research output: Contribution to journalArticle

    28 Citations (Scopus)

    Abstract

    This paper concerns an efficient algorithm for learning in the limit a special type of regular languages called strictly locally testable languages from positive data, and its application to identifying the protein a-chain region in amino acid sequences. First, we present a linear time algorithm that, given a strictly locally testable language, learns (identifies) its deterministic finite state automaton in the limit from only positive data. This provides us with a practical and efficient method for learning a specific concept domain of sequence analysis. We then describe several experimental results using the learning algorithm developed above. Following a theoretical observation which strongly suggests that a certain type of amino acid sequences can be expressed by a locally testable language, we apply the learning algorithm to identifying the protein a-chain region in amino acid sequences for hemoglobin. Experimental scores show an overall success rate of 95 percent correct identification for positive data, and 96 percent for negative data.

    Original languageEnglish
    Pages (from-to)1067-1079
    Number of pages13
    JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
    Volume20
    Issue number10
    DOIs
    Publication statusPublished - 1998

    Fingerprint

    DNA sequences
    Sequence Analysis
    DNA Sequence
    Amino acids
    Amino Acid Sequence
    Learning algorithms
    Percent
    Proteins
    Learning Algorithm
    Formal languages
    Strictly
    Hemoglobin
    Finite automata
    Protein
    Finite State Automata
    Regular Languages
    Linear-time Algorithm
    Efficient Algorithms
    Language
    Learning

    Keywords

    • Deterministic automata
    • Dna sequence analysis
    • Hemoglobin α-chain
    • Local languages
    • Machine learning

    ASJC Scopus subject areas

    • Control and Systems Engineering
    • Electrical and Electronic Engineering
    • Artificial Intelligence
    • Computer Vision and Pattern Recognition

    Cite this

    Learning local languages and their application to DNA sequence analysis. / Yokomori, Takashi; Kobayashi, Satoshi.

    In: IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 10, 1998, p. 1067-1079.

    Research output: Contribution to journalArticle

    @article{ffeb4f2347b94b6ca10f4b8f265fe725,
    title = "Learning local languages and their application to DNA sequence analysis",
    abstract = "This paper concerns an efficient algorithm for learning in the limit a special type of regular languages called strictly locally testable languages from positive data, and its application to identifying the protein a-chain region in amino acid sequences. First, we present a linear time algorithm that, given a strictly locally testable language, learns (identifies) its deterministic finite state automaton in the limit from only positive data. This provides us with a practical and efficient method for learning a specific concept domain of sequence analysis. We then describe several experimental results using the learning algorithm developed above. Following a theoretical observation which strongly suggests that a certain type of amino acid sequences can be expressed by a locally testable language, we apply the learning algorithm to identifying the protein a-chain region in amino acid sequences for hemoglobin. Experimental scores show an overall success rate of 95 percent correct identification for positive data, and 96 percent for negative data.",
    keywords = "Deterministic automata, Dna sequence analysis, Hemoglobin α-chain, Local languages, Machine learning",
    author = "Takashi Yokomori and Satoshi Kobayashi",
    year = "1998",
    doi = "10.1109/34.722617",
    language = "English",
    volume = "20",
    pages = "1067--1079",
    journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",
    issn = "0162-8828",
    publisher = "IEEE Computer Society",
    number = "10",

    }

    TY - JOUR

    T1 - Learning local languages and their application to DNA sequence analysis

    AU - Yokomori, Takashi

    AU - Kobayashi, Satoshi

    PY - 1998

    Y1 - 1998

    N2 - This paper concerns an efficient algorithm for learning in the limit a special type of regular languages called strictly locally testable languages from positive data, and its application to identifying the protein a-chain region in amino acid sequences. First, we present a linear time algorithm that, given a strictly locally testable language, learns (identifies) its deterministic finite state automaton in the limit from only positive data. This provides us with a practical and efficient method for learning a specific concept domain of sequence analysis. We then describe several experimental results using the learning algorithm developed above. Following a theoretical observation which strongly suggests that a certain type of amino acid sequences can be expressed by a locally testable language, we apply the learning algorithm to identifying the protein a-chain region in amino acid sequences for hemoglobin. Experimental scores show an overall success rate of 95 percent correct identification for positive data, and 96 percent for negative data.

    AB - This paper concerns an efficient algorithm for learning in the limit a special type of regular languages called strictly locally testable languages from positive data, and its application to identifying the protein a-chain region in amino acid sequences. First, we present a linear time algorithm that, given a strictly locally testable language, learns (identifies) its deterministic finite state automaton in the limit from only positive data. This provides us with a practical and efficient method for learning a specific concept domain of sequence analysis. We then describe several experimental results using the learning algorithm developed above. Following a theoretical observation which strongly suggests that a certain type of amino acid sequences can be expressed by a locally testable language, we apply the learning algorithm to identifying the protein a-chain region in amino acid sequences for hemoglobin. Experimental scores show an overall success rate of 95 percent correct identification for positive data, and 96 percent for negative data.

    KW - Deterministic automata

    KW - Dna sequence analysis

    KW - Hemoglobin α-chain

    KW - Local languages

    KW - Machine learning

    UR - http://www.scopus.com/inward/record.url?scp=0032180266&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0032180266&partnerID=8YFLogxK

    U2 - 10.1109/34.722617

    DO - 10.1109/34.722617

    M3 - Article

    AN - SCOPUS:0032180266

    VL - 20

    SP - 1067

    EP - 1079

    JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

    JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

    SN - 0162-8828

    IS - 10

    ER -