JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research

Katunobu Itou, Mikio Yamamoto, Kazuya Takeda, Toshiyuki Takezawa, Tatsuo Matsuoka, Tetsunori Kobayashi, Kiyohiro Shikano, Shuichi Itahashi

Research output: Contribution to journalArticle

174 Citations (Scopus)

Abstract

In this paper we present the first public Japanese speech corpus for large vocabulary continuous speech recognition (LVCSR) technology, which we have titled JNAS (Japanese Newspaper Article Sentences). We designed it to be comparable to the corpora used in the American and European LVCSR projects. The corpus contains speech recordings (60 h) and their orthographic transcriptions for 306 speakers (153 males and 153 females) reading excerpts from the newspaper's articles and phonetically balanced (PB) sentences. This corpus contains utterances of about 45,000 sentences as a whole with each speaker reading about 150 sentences. JNAS is being distributed on 16 CD-ROMs.

Original languageEnglish
Pages (from-to)199-206
Number of pages8
JournalJournal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi)
Volume20
Issue number3
Publication statusPublished - 1999
Externally publishedYes

Fingerprint

sentences
speech recognition
CD-ROM
recording

ASJC Scopus subject areas

  • Acoustics and Ultrasonics

Cite this

JNAS : Japanese speech corpus for large vocabulary continuous speech recognition research. / Itou, Katunobu; Yamamoto, Mikio; Takeda, Kazuya; Takezawa, Toshiyuki; Matsuoka, Tatsuo; Kobayashi, Tetsunori; Shikano, Kiyohiro; Itahashi, Shuichi.

In: Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi), Vol. 20, No. 3, 1999, p. 199-206.

Research output: Contribution to journalArticle

Itou, Katunobu ; Yamamoto, Mikio ; Takeda, Kazuya ; Takezawa, Toshiyuki ; Matsuoka, Tatsuo ; Kobayashi, Tetsunori ; Shikano, Kiyohiro ; Itahashi, Shuichi. / JNAS : Japanese speech corpus for large vocabulary continuous speech recognition research. In: Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi). 1999 ; Vol. 20, No. 3. pp. 199-206.
@article{637fb283201e4203ace3017dc2a7e611,
title = "JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research",
abstract = "In this paper we present the first public Japanese speech corpus for large vocabulary continuous speech recognition (LVCSR) technology, which we have titled JNAS (Japanese Newspaper Article Sentences). We designed it to be comparable to the corpora used in the American and European LVCSR projects. The corpus contains speech recordings (60 h) and their orthographic transcriptions for 306 speakers (153 males and 153 females) reading excerpts from the newspaper's articles and phonetically balanced (PB) sentences. This corpus contains utterances of about 45,000 sentences as a whole with each speaker reading about 150 sentences. JNAS is being distributed on 16 CD-ROMs.",
author = "Katunobu Itou and Mikio Yamamoto and Kazuya Takeda and Toshiyuki Takezawa and Tatsuo Matsuoka and Tetsunori Kobayashi and Kiyohiro Shikano and Shuichi Itahashi",
year = "1999",
language = "English",
volume = "20",
pages = "199--206",
journal = "Acoustical Science and Technology",
issn = "1346-3969",
publisher = "Acoustical Society of Japan",
number = "3",

}

TY - JOUR

T1 - JNAS

T2 - Japanese speech corpus for large vocabulary continuous speech recognition research

AU - Itou, Katunobu

AU - Yamamoto, Mikio

AU - Takeda, Kazuya

AU - Takezawa, Toshiyuki

AU - Matsuoka, Tatsuo

AU - Kobayashi, Tetsunori

AU - Shikano, Kiyohiro

AU - Itahashi, Shuichi

PY - 1999

Y1 - 1999

N2 - In this paper we present the first public Japanese speech corpus for large vocabulary continuous speech recognition (LVCSR) technology, which we have titled JNAS (Japanese Newspaper Article Sentences). We designed it to be comparable to the corpora used in the American and European LVCSR projects. The corpus contains speech recordings (60 h) and their orthographic transcriptions for 306 speakers (153 males and 153 females) reading excerpts from the newspaper's articles and phonetically balanced (PB) sentences. This corpus contains utterances of about 45,000 sentences as a whole with each speaker reading about 150 sentences. JNAS is being distributed on 16 CD-ROMs.

AB - In this paper we present the first public Japanese speech corpus for large vocabulary continuous speech recognition (LVCSR) technology, which we have titled JNAS (Japanese Newspaper Article Sentences). We designed it to be comparable to the corpora used in the American and European LVCSR projects. The corpus contains speech recordings (60 h) and their orthographic transcriptions for 306 speakers (153 males and 153 females) reading excerpts from the newspaper's articles and phonetically balanced (PB) sentences. This corpus contains utterances of about 45,000 sentences as a whole with each speaker reading about 150 sentences. JNAS is being distributed on 16 CD-ROMs.

UR - http://www.scopus.com/inward/record.url?scp=0032644224&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032644224&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0032644224

VL - 20

SP - 199

EP - 206

JO - Acoustical Science and Technology

JF - Acoustical Science and Technology

SN - 1346-3969

IS - 3

ER -