A semi-supervised learning approach for RNA secondary structure prediction

Haruka Yonemoto, Kiyoshi Asai, Michiaki Hamada

    Research output: Contribution to journalArticle

    4 Citations (Scopus)

    Abstract

    RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited.

    Original languageEnglish
    Pages (from-to)72-79
    Number of pages8
    JournalComputational Biology and Chemistry
    Volume57
    DOIs
    Publication statusPublished - 2015 Jan 2

    Fingerprint

    RNA Secondary Structure
    Semi-supervised Learning
    Structure Prediction
    Supervised learning
    RNA
    Secondary Structure
    Probabilistic Model
    Statistical Models
    Unknown
    Natural Language Processing
    Conditional Random Fields
    Context-free Grammar
    Computational Experiments
    Natural Language
    Supervised Machine Learning
    Computational Biology
    Bioinformatics
    Crystal
    Context free grammars
    Model

    Keywords

    • Parameter learning
    • RNA secondary structure
    • Semi-supervised learning

    ASJC Scopus subject areas

    • Biochemistry
    • Structural Biology
    • Organic Chemistry
    • Computational Mathematics

    Cite this

    A semi-supervised learning approach for RNA secondary structure prediction. / Yonemoto, Haruka; Asai, Kiyoshi; Hamada, Michiaki.

    In: Computational Biology and Chemistry, Vol. 57, 02.01.2015, p. 72-79.

    Research output: Contribution to journalArticle

    @article{79b78cfd04c94db29e343934b7cc0086,
    title = "A semi-supervised learning approach for RNA secondary structure prediction",
    abstract = "RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited.",
    keywords = "Parameter learning, RNA secondary structure, Semi-supervised learning",
    author = "Haruka Yonemoto and Kiyoshi Asai and Michiaki Hamada",
    year = "2015",
    month = "1",
    day = "2",
    doi = "10.1016/j.compbiolchem.2015.02.002",
    language = "English",
    volume = "57",
    pages = "72--79",
    journal = "Computational Biology and Chemistry",
    issn = "1476-9271",
    publisher = "Elsevier Limited",

    }

    TY - JOUR

    T1 - A semi-supervised learning approach for RNA secondary structure prediction

    AU - Yonemoto, Haruka

    AU - Asai, Kiyoshi

    AU - Hamada, Michiaki

    PY - 2015/1/2

    Y1 - 2015/1/2

    N2 - RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited.

    AB - RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited.

    KW - Parameter learning

    KW - RNA secondary structure

    KW - Semi-supervised learning

    UR - http://www.scopus.com/inward/record.url?scp=84939599043&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84939599043&partnerID=8YFLogxK

    U2 - 10.1016/j.compbiolchem.2015.02.002

    DO - 10.1016/j.compbiolchem.2015.02.002

    M3 - Article

    VL - 57

    SP - 72

    EP - 79

    JO - Computational Biology and Chemistry

    JF - Computational Biology and Chemistry

    SN - 1476-9271

    ER -