Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods

Evaluation by large-scale real data

Shuichi Takitoh, Shogo Fujii, Yoichi Mase, Junichi Takasaki, Toshimasa Yamazaki, Yozo Ohnishi, Masao Yanagisawa, Yusuke Nakamura, Naoyuki Kamatani

    Research output: Contribution to journalArticle

    2 Citations (Scopus)

    Abstract

    Motivation: The Invader assay is a fluorescence-based high-throughput genotyping technology. If the output data from the Invader assay were classified automatically, then genotypes for individuals would be determined efficiently. However, existing classification methods do not necessarily yield results with the same accuracy as can be achieved by technicians. Our clustering algorithm, Genocluster, is intended to increase the proportion of data points that need not be manually corrected by technicians. Results: Genocluster worked well even when the number of clusters was unknown in advance and when there were only a few points in a cluster. The use of Genocluster enabled us to achieve an acceptance rate (proportion of assay results that did not need to be corrected by expert technicians) of 84.4% and a proportion of uncorrected points of 95.8%, as determined using the data from over 31 million points.

    Original languageEnglish
    Pages (from-to)408-413
    Number of pages6
    JournalBioinformatics
    Volume23
    Issue number4
    DOIs
    Publication statusPublished - 2007 Feb 15

    Fingerprint

    Single nucleotide Polymorphism
    Nucleotides
    Polymorphism
    Clustering Methods
    Single Nucleotide Polymorphism
    Cluster Analysis
    Assays
    Fluorescence
    Genotype
    Clustering
    Technology
    Proportion
    Evaluation
    Clustering algorithms
    Number of Clusters
    Throughput
    High Throughput
    Clustering Algorithm
    Unknown
    Output

    ASJC Scopus subject areas

    • Clinical Biochemistry
    • Computer Science Applications
    • Computational Theory and Mathematics

    Cite this

    Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods : Evaluation by large-scale real data. / Takitoh, Shuichi; Fujii, Shogo; Mase, Yoichi; Takasaki, Junichi; Yamazaki, Toshimasa; Ohnishi, Yozo; Yanagisawa, Masao; Nakamura, Yusuke; Kamatani, Naoyuki.

    In: Bioinformatics, Vol. 23, No. 4, 15.02.2007, p. 408-413.

    Research output: Contribution to journalArticle

    Takitoh, Shuichi ; Fujii, Shogo ; Mase, Yoichi ; Takasaki, Junichi ; Yamazaki, Toshimasa ; Ohnishi, Yozo ; Yanagisawa, Masao ; Nakamura, Yusuke ; Kamatani, Naoyuki. / Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods : Evaluation by large-scale real data. In: Bioinformatics. 2007 ; Vol. 23, No. 4. pp. 408-413.
    @article{aa7bd0dda998403eb9afe1094223d330,
    title = "Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods: Evaluation by large-scale real data",
    abstract = "Motivation: The Invader assay is a fluorescence-based high-throughput genotyping technology. If the output data from the Invader assay were classified automatically, then genotypes for individuals would be determined efficiently. However, existing classification methods do not necessarily yield results with the same accuracy as can be achieved by technicians. Our clustering algorithm, Genocluster, is intended to increase the proportion of data points that need not be manually corrected by technicians. Results: Genocluster worked well even when the number of clusters was unknown in advance and when there were only a few points in a cluster. The use of Genocluster enabled us to achieve an acceptance rate (proportion of assay results that did not need to be corrected by expert technicians) of 84.4{\%} and a proportion of uncorrected points of 95.8{\%}, as determined using the data from over 31 million points.",
    author = "Shuichi Takitoh and Shogo Fujii and Yoichi Mase and Junichi Takasaki and Toshimasa Yamazaki and Yozo Ohnishi and Masao Yanagisawa and Yusuke Nakamura and Naoyuki Kamatani",
    year = "2007",
    month = "2",
    day = "15",
    doi = "10.1093/bioinformatics/btl133",
    language = "English",
    volume = "23",
    pages = "408--413",
    journal = "Bioinformatics",
    issn = "1367-4803",
    publisher = "Oxford University Press",
    number = "4",

    }

    TY - JOUR

    T1 - Accurate automated clustering of two-dimensional data for single-nucleotide polymorphism genotyping by a combination of clustering methods

    T2 - Evaluation by large-scale real data

    AU - Takitoh, Shuichi

    AU - Fujii, Shogo

    AU - Mase, Yoichi

    AU - Takasaki, Junichi

    AU - Yamazaki, Toshimasa

    AU - Ohnishi, Yozo

    AU - Yanagisawa, Masao

    AU - Nakamura, Yusuke

    AU - Kamatani, Naoyuki

    PY - 2007/2/15

    Y1 - 2007/2/15

    N2 - Motivation: The Invader assay is a fluorescence-based high-throughput genotyping technology. If the output data from the Invader assay were classified automatically, then genotypes for individuals would be determined efficiently. However, existing classification methods do not necessarily yield results with the same accuracy as can be achieved by technicians. Our clustering algorithm, Genocluster, is intended to increase the proportion of data points that need not be manually corrected by technicians. Results: Genocluster worked well even when the number of clusters was unknown in advance and when there were only a few points in a cluster. The use of Genocluster enabled us to achieve an acceptance rate (proportion of assay results that did not need to be corrected by expert technicians) of 84.4% and a proportion of uncorrected points of 95.8%, as determined using the data from over 31 million points.

    AB - Motivation: The Invader assay is a fluorescence-based high-throughput genotyping technology. If the output data from the Invader assay were classified automatically, then genotypes for individuals would be determined efficiently. However, existing classification methods do not necessarily yield results with the same accuracy as can be achieved by technicians. Our clustering algorithm, Genocluster, is intended to increase the proportion of data points that need not be manually corrected by technicians. Results: Genocluster worked well even when the number of clusters was unknown in advance and when there were only a few points in a cluster. The use of Genocluster enabled us to achieve an acceptance rate (proportion of assay results that did not need to be corrected by expert technicians) of 84.4% and a proportion of uncorrected points of 95.8%, as determined using the data from over 31 million points.

    UR - http://www.scopus.com/inward/record.url?scp=33847335428&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=33847335428&partnerID=8YFLogxK

    U2 - 10.1093/bioinformatics/btl133

    DO - 10.1093/bioinformatics/btl133

    M3 - Article

    VL - 23

    SP - 408

    EP - 413

    JO - Bioinformatics

    JF - Bioinformatics

    SN - 1367-4803

    IS - 4

    ER -