Learning local languages and their application to DNA sequence analysis

Takashi Yokomori*, Satoshi Kobayashi

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

29 Citations (Scopus)

Abstract

This paper concerns an efficient algorithm for learning in the limit a special type of regular languages called strictly locally testable languages from positive data, and its application to identifying the protein a-chain region in amino acid sequences. First, we present a linear time algorithm that, given a strictly locally testable language, learns (identifies) its deterministic finite state automaton in the limit from only positive data. This provides us with a practical and efficient method for learning a specific concept domain of sequence analysis. We then describe several experimental results using the learning algorithm developed above. Following a theoretical observation which strongly suggests that a certain type of amino acid sequences can be expressed by a locally testable language, we apply the learning algorithm to identifying the protein a-chain region in amino acid sequences for hemoglobin. Experimental scores show an overall success rate of 95 percent correct identification for positive data, and 96 percent for negative data.

Original languageEnglish
Pages (from-to)1067-1079
Number of pages13
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume20
Issue number10
DOIs
Publication statusPublished - 1998

Keywords

  • Deterministic automata
  • Dna sequence analysis
  • Hemoglobin α-chain
  • Local languages
  • Machine learning

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Computational Theory and Mathematics
  • Artificial Intelligence
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Learning local languages and their application to DNA sequence analysis'. Together they form a unique fingerprint.

Cite this