Representation learning applications in biological sequence analysis

Hitoshi Iuchi*, Taro Matsutani, Keisuke Yamada, Natsuki Iwano, Shunsuke Sumi, Shion Hosoda, Shitao Zhao, Tsukasa Fukunaga, Michiaki Hamada

*この研究の対応する著者

研究成果: Review article査読

抄録

Although remarkable advances have been reported in high-throughput sequencing, the ability to aptly analyze a substantial amount of rapidly generated biological (DNA/RNA/protein) sequencing data remains a critical hurdle. To tackle this issue, the application of natural language processing (NLP) to biological sequence analysis has received increased attention. In this method, biological sequences are regarded as sentences while the single nucleic acids/amino acids or k-mers in these sequences represent the words. Embedding is an essential step in NLP, which performs the conversion of these words into vectors. Specifically, representation learning is an approach used for this transformation process, which can be applied to biological sequences. Vectorized biological sequences can then be applied for function and structure estimation, or as input for other probabilistic models. Considering the importance and growing trend for the application of representation learning to biological research, in the present study, we have reviewed the existing knowledge in representation learning for biological sequence analysis.

本文言語English
ページ(範囲)3198-3208
ページ数11
ジャーナルComputational and Structural Biotechnology Journal
19
DOI
出版ステータスPublished - 2021 1

ASJC Scopus subject areas

  • バイオテクノロジー
  • 生物理学
  • 構造生物学
  • 生化学
  • 遺伝学
  • コンピュータ サイエンスの応用

フィンガープリント

「Representation learning applications in biological sequence analysis」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル