Abstract
With the development of the human genome analysis project, it is becoming possible to utilize large-scale genome sequence data. One genome analysis method based on large-scale sequence data is genome sequence walking. Applying sequence walking to the segment sequence database, it is possible to estimate the whole sequence of the gene to which the query sequence belongs by using the gene segment. By sequence walking, the researcher can estimate the genome sequence without going through biological experiments. This saves time and expense in sequence determination. Sequence walking has been performed using the well-known BLAST. BLAST, however, is a tool based on similarity search, and is not adequate in sequence walking in which the same gene segments are connected, both from the viewpoint of efficiency and from the viewpoint of accuracy. In this study, it is shown that genome sequence walking is not a problem of similarity search, but is a string matching problem permitting error. A system dedicated to sequence walking is constructed by improving the string matching algorithm, which is more suited to sequence walking. The result has been publicized on the WWW. The proposed sequence walking system can realize sequence walking that is faster and more accurate than the conventional sequence walking by BLAST, thus reducing the burden on the researcher.
Original language | English |
---|---|
Pages (from-to) | 64-72 |
Number of pages | 9 |
Journal | Electronics and Communications in Japan, Part II: Electronics (English translation of Denshi Tsushin Gakkai Ronbunshi) |
Volume | 86 |
Issue number | 1 |
DOIs | |
Publication status | Published - 2003 Jan 1 |
Externally published | Yes |
Keywords
- BLAST
- Genome analysis
- Genome sequence walking
- String matching algorithm
ASJC Scopus subject areas
- Physics and Astronomy(all)
- Computer Networks and Communications
- Electrical and Electronic Engineering