A language model adaptation using multiple varied corpora

H. Yamamoto, Yoshinori Sagisaka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

A new language model adaptation scheme is proposed to cope with multiple varied speech recognition tasks. Both topic difference and sentence style difference resulting from the speaker's role are reflected in the proposed language model adaptation. An adaptation is carried out using two different language corpora where only the topic or speaker's style is matched. New word clustering techniques are introduced to extract the topic or style dependency separately. Word neighboring characteristics in the two adaptation source data are regarded as different features in this clustering. All words are classified into commonly used word classes and topic or style dependent classes. Furthermore, target topic and sentence style dependent words and their neighboring characteristics are emphasized according to their frequency in the adaptation target data. In the evaluation experiment, the proposed method shows a 13% lower perplexity and a 9% lower word error rate in continuous speech recognition compared with the conventional adaptation method.

Original languageEnglish
Title of host publication2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages389-392
Number of pages4
ISBN (Print)078037343X, 9780780373433
DOIs
Publication statusPublished - 2001
Externally publishedYes
EventIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Madonna di Campiglio, Italy
Duration: 2001 Dec 92001 Dec 13

Other

OtherIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001
CountryItaly
CityMadonna di Campiglio
Period01/12/901/12/13

Fingerprint

Continuous speech recognition
Speech recognition
Experiments

ASJC Scopus subject areas

  • Hardware and Architecture
  • Electrical and Electronic Engineering

Cite this

Yamamoto, H., & Sagisaka, Y. (2001). A language model adaptation using multiple varied corpora. In 2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings (pp. 389-392). [1034666] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU.2001.1034666

A language model adaptation using multiple varied corpora. / Yamamoto, H.; Sagisaka, Yoshinori.

2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings. Institute of Electrical and Electronics Engineers Inc., 2001. p. 389-392 1034666.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yamamoto, H & Sagisaka, Y 2001, A language model adaptation using multiple varied corpora. in 2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings., 1034666, Institute of Electrical and Electronics Engineers Inc., pp. 389-392, IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001, Madonna di Campiglio, Italy, 01/12/9. https://doi.org/10.1109/ASRU.2001.1034666
Yamamoto H, Sagisaka Y. A language model adaptation using multiple varied corpora. In 2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings. Institute of Electrical and Electronics Engineers Inc. 2001. p. 389-392. 1034666 https://doi.org/10.1109/ASRU.2001.1034666
Yamamoto, H. ; Sagisaka, Yoshinori. / A language model adaptation using multiple varied corpora. 2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings. Institute of Electrical and Electronics Engineers Inc., 2001. pp. 389-392
@inproceedings{be4c684be5924c998971231aea189110,
title = "A language model adaptation using multiple varied corpora",
abstract = "A new language model adaptation scheme is proposed to cope with multiple varied speech recognition tasks. Both topic difference and sentence style difference resulting from the speaker's role are reflected in the proposed language model adaptation. An adaptation is carried out using two different language corpora where only the topic or speaker's style is matched. New word clustering techniques are introduced to extract the topic or style dependency separately. Word neighboring characteristics in the two adaptation source data are regarded as different features in this clustering. All words are classified into commonly used word classes and topic or style dependent classes. Furthermore, target topic and sentence style dependent words and their neighboring characteristics are emphasized according to their frequency in the adaptation target data. In the evaluation experiment, the proposed method shows a 13{\%} lower perplexity and a 9{\%} lower word error rate in continuous speech recognition compared with the conventional adaptation method.",
author = "H. Yamamoto and Yoshinori Sagisaka",
year = "2001",
doi = "10.1109/ASRU.2001.1034666",
language = "English",
isbn = "078037343X",
pages = "389--392",
booktitle = "2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - A language model adaptation using multiple varied corpora

AU - Yamamoto, H.

AU - Sagisaka, Yoshinori

PY - 2001

Y1 - 2001

N2 - A new language model adaptation scheme is proposed to cope with multiple varied speech recognition tasks. Both topic difference and sentence style difference resulting from the speaker's role are reflected in the proposed language model adaptation. An adaptation is carried out using two different language corpora where only the topic or speaker's style is matched. New word clustering techniques are introduced to extract the topic or style dependency separately. Word neighboring characteristics in the two adaptation source data are regarded as different features in this clustering. All words are classified into commonly used word classes and topic or style dependent classes. Furthermore, target topic and sentence style dependent words and their neighboring characteristics are emphasized according to their frequency in the adaptation target data. In the evaluation experiment, the proposed method shows a 13% lower perplexity and a 9% lower word error rate in continuous speech recognition compared with the conventional adaptation method.

AB - A new language model adaptation scheme is proposed to cope with multiple varied speech recognition tasks. Both topic difference and sentence style difference resulting from the speaker's role are reflected in the proposed language model adaptation. An adaptation is carried out using two different language corpora where only the topic or speaker's style is matched. New word clustering techniques are introduced to extract the topic or style dependency separately. Word neighboring characteristics in the two adaptation source data are regarded as different features in this clustering. All words are classified into commonly used word classes and topic or style dependent classes. Furthermore, target topic and sentence style dependent words and their neighboring characteristics are emphasized according to their frequency in the adaptation target data. In the evaluation experiment, the proposed method shows a 13% lower perplexity and a 9% lower word error rate in continuous speech recognition compared with the conventional adaptation method.

UR - http://www.scopus.com/inward/record.url?scp=84962802488&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84962802488&partnerID=8YFLogxK

U2 - 10.1109/ASRU.2001.1034666

DO - 10.1109/ASRU.2001.1034666

M3 - Conference contribution

SN - 078037343X

SN - 9780780373433

SP - 389

EP - 392

BT - 2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -