Cluster analysis of regulatory sequences with a log likelihood ratio statistics-based similarity measure

Huiru Zheng, Haiying Wang, Takayuki Furuzuki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Upstream regions in the DNA sequence are characterized by the presence of short regulatory motifs, which function as target binding sites for transcription factors. Finding two genes with common motifs in their regulatory regions may aid users in identifying co-regulated genes or inferring regulatory modules. By modelling pattern occurrences in the regulatory regions with Poisson statistics, this paper presents a log likelihood ratio statistics-based distance measure to calculate pair-wise similarities between sequences. To perform cluster analysis of regulatory sequences, this paper introduces two clustering algorithms on the basis of the incorporation of the log likelihood ratio statistics-based distance into hierarchical clustering and Self-Organizing Map. The proposed approach has been tested on a synthetic dataset and a real biological example. The results indicate that, in comparison to traditional distance functions, the log likelihood ratio statistics-based similarity measure offers considerable improvements in the process of regulatory sequence-based gene classification.

Original languageEnglish
Title of host publicationProceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE
Pages1220-1224
Number of pages5
DOIs
Publication statusPublished - 2007
Event7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE - Boston, MA
Duration: 2007 Jan 142007 Jan 17

Other

Other7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE
CityBoston, MA
Period07/1/1407/1/17

Fingerprint

Cluster analysis
Cluster Analysis
Statistics
Genes
Nucleic Acid Regulatory Sequences
Likelihood Functions
Transcription factors
DNA sequences
Binding sites
Clustering algorithms
Transcription Factors
Binding Sites

Keywords

  • Cluster analysis
  • Log likelihood ratlio
  • Poisson distribution
  • Regulatory sequence

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Bioengineering

Cite this

Zheng, H., Wang, H., & Furuzuki, T. (2007). Cluster analysis of regulatory sequences with a log likelihood ratio statistics-based similarity measure. In Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE (pp. 1220-1224). [4375719] https://doi.org/10.1109/BIBE.2007.4375719

Cluster analysis of regulatory sequences with a log likelihood ratio statistics-based similarity measure. / Zheng, Huiru; Wang, Haiying; Furuzuki, Takayuki.

Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE. 2007. p. 1220-1224 4375719.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zheng, H, Wang, H & Furuzuki, T 2007, Cluster analysis of regulatory sequences with a log likelihood ratio statistics-based similarity measure. in Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE., 4375719, pp. 1220-1224, 7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE, Boston, MA, 07/1/14. https://doi.org/10.1109/BIBE.2007.4375719
Zheng H, Wang H, Furuzuki T. Cluster analysis of regulatory sequences with a log likelihood ratio statistics-based similarity measure. In Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE. 2007. p. 1220-1224. 4375719 https://doi.org/10.1109/BIBE.2007.4375719
Zheng, Huiru ; Wang, Haiying ; Furuzuki, Takayuki. / Cluster analysis of regulatory sequences with a log likelihood ratio statistics-based similarity measure. Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE. 2007. pp. 1220-1224
@inproceedings{2eef65f2ac3348dcaafa50f709f367bc,
title = "Cluster analysis of regulatory sequences with a log likelihood ratio statistics-based similarity measure",
abstract = "Upstream regions in the DNA sequence are characterized by the presence of short regulatory motifs, which function as target binding sites for transcription factors. Finding two genes with common motifs in their regulatory regions may aid users in identifying co-regulated genes or inferring regulatory modules. By modelling pattern occurrences in the regulatory regions with Poisson statistics, this paper presents a log likelihood ratio statistics-based distance measure to calculate pair-wise similarities between sequences. To perform cluster analysis of regulatory sequences, this paper introduces two clustering algorithms on the basis of the incorporation of the log likelihood ratio statistics-based distance into hierarchical clustering and Self-Organizing Map. The proposed approach has been tested on a synthetic dataset and a real biological example. The results indicate that, in comparison to traditional distance functions, the log likelihood ratio statistics-based similarity measure offers considerable improvements in the process of regulatory sequence-based gene classification.",
keywords = "Cluster analysis, Log likelihood ratlio, Poisson distribution, Regulatory sequence",
author = "Huiru Zheng and Haiying Wang and Takayuki Furuzuki",
year = "2007",
doi = "10.1109/BIBE.2007.4375719",
language = "English",
isbn = "1424415098",
pages = "1220--1224",
booktitle = "Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE",

}

TY - GEN

T1 - Cluster analysis of regulatory sequences with a log likelihood ratio statistics-based similarity measure

AU - Zheng, Huiru

AU - Wang, Haiying

AU - Furuzuki, Takayuki

PY - 2007

Y1 - 2007

N2 - Upstream regions in the DNA sequence are characterized by the presence of short regulatory motifs, which function as target binding sites for transcription factors. Finding two genes with common motifs in their regulatory regions may aid users in identifying co-regulated genes or inferring regulatory modules. By modelling pattern occurrences in the regulatory regions with Poisson statistics, this paper presents a log likelihood ratio statistics-based distance measure to calculate pair-wise similarities between sequences. To perform cluster analysis of regulatory sequences, this paper introduces two clustering algorithms on the basis of the incorporation of the log likelihood ratio statistics-based distance into hierarchical clustering and Self-Organizing Map. The proposed approach has been tested on a synthetic dataset and a real biological example. The results indicate that, in comparison to traditional distance functions, the log likelihood ratio statistics-based similarity measure offers considerable improvements in the process of regulatory sequence-based gene classification.

AB - Upstream regions in the DNA sequence are characterized by the presence of short regulatory motifs, which function as target binding sites for transcription factors. Finding two genes with common motifs in their regulatory regions may aid users in identifying co-regulated genes or inferring regulatory modules. By modelling pattern occurrences in the regulatory regions with Poisson statistics, this paper presents a log likelihood ratio statistics-based distance measure to calculate pair-wise similarities between sequences. To perform cluster analysis of regulatory sequences, this paper introduces two clustering algorithms on the basis of the incorporation of the log likelihood ratio statistics-based distance into hierarchical clustering and Self-Organizing Map. The proposed approach has been tested on a synthetic dataset and a real biological example. The results indicate that, in comparison to traditional distance functions, the log likelihood ratio statistics-based similarity measure offers considerable improvements in the process of regulatory sequence-based gene classification.

KW - Cluster analysis

KW - Log likelihood ratlio

KW - Poisson distribution

KW - Regulatory sequence

UR - http://www.scopus.com/inward/record.url?scp=47649108422&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=47649108422&partnerID=8YFLogxK

U2 - 10.1109/BIBE.2007.4375719

DO - 10.1109/BIBE.2007.4375719

M3 - Conference contribution

AN - SCOPUS:47649108422

SN - 1424415098

SN - 9781424415090

SP - 1220

EP - 1224

BT - Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE

ER -