Multiclass composite N-gram language model based on connection direction

Hirofumi Yamamoto, Yoshinori Sagisaka

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

The authors propose a method to generate a compact, highly reliable language model for speech recognition based on the efficient classification of words. In this method, the connectedness with the words immediately before and after the word is taken to represent separate attributes, and individual classification is performed for each word. The resulting composite word class is created separately based on the distribution of words connected before or after. As a result, classification of classes is efficient and reliable. In a multiclass composite N-gram, which uses the proposed method for the variable-order N-gram to bring in chain words, the entry size is reduced to one-tenth, and the word recognition rate is higher than that of a conventional composite N-gram for particles or variable-length word arrays.

Original languageEnglish
Pages (from-to)108-114
Number of pages7
JournalSystems and Computers in Japan
Volume34
Issue number7
DOIs
Publication statusPublished - 2003 Jun 30
Externally publishedYes

Fingerprint

N-gram
Language Model
Multi-class
Composite
Model-based
Composite materials
Connectedness
Speech Recognition
Speech recognition
Immediately
Attribute
Class

Keywords

  • Automatic class classification
  • Chain words
  • Class N-gram
  • Variable-order N-gram

ASJC Scopus subject areas

  • Hardware and Architecture
  • Information Systems
  • Theoretical Computer Science
  • Computational Theory and Mathematics

Cite this

Multiclass composite N-gram language model based on connection direction. / Yamamoto, Hirofumi; Sagisaka, Yoshinori.

In: Systems and Computers in Japan, Vol. 34, No. 7, 30.06.2003, p. 108-114.

Research output: Contribution to journalArticle

@article{c05fb801826b4a63857693bdfea6ec85,
title = "Multiclass composite N-gram language model based on connection direction",
abstract = "The authors propose a method to generate a compact, highly reliable language model for speech recognition based on the efficient classification of words. In this method, the connectedness with the words immediately before and after the word is taken to represent separate attributes, and individual classification is performed for each word. The resulting composite word class is created separately based on the distribution of words connected before or after. As a result, classification of classes is efficient and reliable. In a multiclass composite N-gram, which uses the proposed method for the variable-order N-gram to bring in chain words, the entry size is reduced to one-tenth, and the word recognition rate is higher than that of a conventional composite N-gram for particles or variable-length word arrays.",
keywords = "Automatic class classification, Chain words, Class N-gram, Variable-order N-gram",
author = "Hirofumi Yamamoto and Yoshinori Sagisaka",
year = "2003",
month = "6",
day = "30",
doi = "10.1002/scj.1210",
language = "English",
volume = "34",
pages = "108--114",
journal = "Systems and Computers in Japan",
issn = "0882-1666",
publisher = "John Wiley and Sons Inc.",
number = "7",

}

TY - JOUR

T1 - Multiclass composite N-gram language model based on connection direction

AU - Yamamoto, Hirofumi

AU - Sagisaka, Yoshinori

PY - 2003/6/30

Y1 - 2003/6/30

N2 - The authors propose a method to generate a compact, highly reliable language model for speech recognition based on the efficient classification of words. In this method, the connectedness with the words immediately before and after the word is taken to represent separate attributes, and individual classification is performed for each word. The resulting composite word class is created separately based on the distribution of words connected before or after. As a result, classification of classes is efficient and reliable. In a multiclass composite N-gram, which uses the proposed method for the variable-order N-gram to bring in chain words, the entry size is reduced to one-tenth, and the word recognition rate is higher than that of a conventional composite N-gram for particles or variable-length word arrays.

AB - The authors propose a method to generate a compact, highly reliable language model for speech recognition based on the efficient classification of words. In this method, the connectedness with the words immediately before and after the word is taken to represent separate attributes, and individual classification is performed for each word. The resulting composite word class is created separately based on the distribution of words connected before or after. As a result, classification of classes is efficient and reliable. In a multiclass composite N-gram, which uses the proposed method for the variable-order N-gram to bring in chain words, the entry size is reduced to one-tenth, and the word recognition rate is higher than that of a conventional composite N-gram for particles or variable-length word arrays.

KW - Automatic class classification

KW - Chain words

KW - Class N-gram

KW - Variable-order N-gram

UR - http://www.scopus.com/inward/record.url?scp=0038368817&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0038368817&partnerID=8YFLogxK

U2 - 10.1002/scj.1210

DO - 10.1002/scj.1210

M3 - Article

VL - 34

SP - 108

EP - 114

JO - Systems and Computers in Japan

JF - Systems and Computers in Japan

SN - 0882-1666

IS - 7

ER -