Multi-Class Composite N-gram based on connection direction

Hirofumi Yamamoto, Yoshinori Sagisaka

Research output: Chapter in Book/Report/Conference proceedingChapter

32 Citations (Scopus)

Abstract

A new word-clustering technique is proposed to efficiently build statistically salient class 2-grams from language corpora. By splitting word neighboring characteristics into word-preceding and following directions, multiple (two-dimensional) word classes are assigned to each word. In each side, word classes are merged into larger clusters independently according to preceding or following word distributions. This word-clustering can provide more efficient and statistically reliable word clusters. Further, we extend it to Multi-Class Composite N-gram that unit is Multi-Class 2-gram and joined word. Multi-Class Composite N-gram showed better performance both in perplexity and recognition rates with one thousandth smaller size than conventional word 2-grams.

Original languageEnglish
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
PublisherIEEE
Pages533-536
Number of pages4
Volume1
Publication statusPublished - 1999
Externally publishedYes
EventProceedings of the 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-99) - Phoenix, AZ, USA
Duration: 1999 Mar 151999 Mar 19

Other

OtherProceedings of the 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-99)
CityPhoenix, AZ, USA
Period99/3/1599/3/19

Fingerprint

composite materials
Composite materials

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Acoustics and Ultrasonics

Cite this

Yamamoto, H., & Sagisaka, Y. (1999). Multi-Class Composite N-gram based on connection direction. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 1, pp. 533-536). IEEE.

Multi-Class Composite N-gram based on connection direction. / Yamamoto, Hirofumi; Sagisaka, Yoshinori.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 1 IEEE, 1999. p. 533-536.

Research output: Chapter in Book/Report/Conference proceedingChapter

Yamamoto, H & Sagisaka, Y 1999, Multi-Class Composite N-gram based on connection direction. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. vol. 1, IEEE, pp. 533-536, Proceedings of the 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-99), Phoenix, AZ, USA, 99/3/15.
Yamamoto H, Sagisaka Y. Multi-Class Composite N-gram based on connection direction. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 1. IEEE. 1999. p. 533-536
Yamamoto, Hirofumi ; Sagisaka, Yoshinori. / Multi-Class Composite N-gram based on connection direction. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 1 IEEE, 1999. pp. 533-536
@inbook{6ebb972343564064ae88453a404bd3db,
title = "Multi-Class Composite N-gram based on connection direction",
abstract = "A new word-clustering technique is proposed to efficiently build statistically salient class 2-grams from language corpora. By splitting word neighboring characteristics into word-preceding and following directions, multiple (two-dimensional) word classes are assigned to each word. In each side, word classes are merged into larger clusters independently according to preceding or following word distributions. This word-clustering can provide more efficient and statistically reliable word clusters. Further, we extend it to Multi-Class Composite N-gram that unit is Multi-Class 2-gram and joined word. Multi-Class Composite N-gram showed better performance both in perplexity and recognition rates with one thousandth smaller size than conventional word 2-grams.",
author = "Hirofumi Yamamoto and Yoshinori Sagisaka",
year = "1999",
language = "English",
volume = "1",
pages = "533--536",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
publisher = "IEEE",

}

TY - CHAP

T1 - Multi-Class Composite N-gram based on connection direction

AU - Yamamoto, Hirofumi

AU - Sagisaka, Yoshinori

PY - 1999

Y1 - 1999

N2 - A new word-clustering technique is proposed to efficiently build statistically salient class 2-grams from language corpora. By splitting word neighboring characteristics into word-preceding and following directions, multiple (two-dimensional) word classes are assigned to each word. In each side, word classes are merged into larger clusters independently according to preceding or following word distributions. This word-clustering can provide more efficient and statistically reliable word clusters. Further, we extend it to Multi-Class Composite N-gram that unit is Multi-Class 2-gram and joined word. Multi-Class Composite N-gram showed better performance both in perplexity and recognition rates with one thousandth smaller size than conventional word 2-grams.

AB - A new word-clustering technique is proposed to efficiently build statistically salient class 2-grams from language corpora. By splitting word neighboring characteristics into word-preceding and following directions, multiple (two-dimensional) word classes are assigned to each word. In each side, word classes are merged into larger clusters independently according to preceding or following word distributions. This word-clustering can provide more efficient and statistically reliable word clusters. Further, we extend it to Multi-Class Composite N-gram that unit is Multi-Class 2-gram and joined word. Multi-Class Composite N-gram showed better performance both in perplexity and recognition rates with one thousandth smaller size than conventional word 2-grams.

UR - http://www.scopus.com/inward/record.url?scp=0032626587&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032626587&partnerID=8YFLogxK

M3 - Chapter

AN - SCOPUS:0032626587

VL - 1

SP - 533

EP - 536

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

PB - IEEE

ER -