Detection of unknown words in large vocabulary speech recognition

Satoru Hayamizu, Katunobu Itou, Kazuyo Tanaka

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

This paper describes the relation between vocabulary sizes and detection errors of unknown words in large vocabulary speech recognition through recognition and detection experiments. Although the relation between vocabulary sizes and recognition performances has been reported, the relation between vocabulary sizes and detection performances has not yet been studied. Especially, it has not for the cases of vocabulary sizes of over 1,000 words. Experiments were conducted using the speech material of speaker MAU's ATR word speech database. The entries of the dictionary used is 40,000 words from the Shinmeikai Japanese Language Dictionary. It is shown that when the vocabulary size increases from 1,000 words to 40,000 words, the relation between vocabulary sizes and detection errors has a similar tendency with the relation between vocabulary sizes and recognition errors. And increases of detection errors caused by increases of vocabulary sizes are shown to be small for the case of within vocabulary, compared with increases of detection errors for the case of out of vocabulary. These results should be taken into accounts in designing large vocabulary speech recognition systems including unknown word processing.

Original languageEnglish
Pages (from-to)165-171
Number of pages7
JournalJournal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi)
Volume16
Issue number3
DOIs
Publication statusPublished - 1995
Externally publishedYes

Keywords

  • Speech recognition
  • Unknown word processing
  • Vocabulary size

ASJC Scopus subject areas

  • Acoustics and Ultrasonics

Fingerprint

Dive into the research topics of 'Detection of unknown words in large vocabulary speech recognition'. Together they form a unique fingerprint.

Cite this