Parallel speech corpora of Japanese dialects

Koichiro Yoshino, Naoki Hirayama, Shinsuke Mori, Fumihiko Takahashi, Katsutoshi Itoyama, Hiroshi G. Okuno

    研究成果: Conference contribution

    3 被引用数 (Scopus)

    抄録

    Clean speech data is necessary for spoken language processing, however, there is no public Japanese dialect corpus collected for speech processing. Parallel speech corpora of dialect are also important because real dialect affects each other, however, the existing data only includes noisy speech data of dialects and their translation in common language. In this paper, we collected parallel speech corpora of Japanese dialect, 100 read speeches utterance of 25 dialect speakers and their transcriptions of phoneme. We recorded speeches of 5 common language speakers and 20 dialect speakers from 4 areas, 5 speakers from 1 area, respectively. Each dialect speaker converted the same common language texts to their dialect and read them. Speeches are recorded with closed-talk microphone, using for spoken language processing (recognition, synthesis, pronounce estimation). In the experiments, accuracies of automatic speech recognition (ASR) and Kana Kanji conversion (KKC) system are improved by adapting the system with the data.

    本文言語English
    ホスト出版物のタイトルProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
    出版社European Language Resources Association (ELRA)
    ページ4652-4657
    ページ数6
    ISBN(電子版)9782951740891
    出版ステータスPublished - 2016 1 1
    イベント10th International Conference on Language Resources and Evaluation, LREC 2016 - Portoroz, Slovenia
    継続期間: 2016 5 232016 5 28

    Other

    Other10th International Conference on Language Resources and Evaluation, LREC 2016
    国/地域Slovenia
    CityPortoroz
    Period16/5/2316/5/28

    ASJC Scopus subject areas

    • 言語学および言語
    • 図書館情報学
    • 言語および言語学
    • 教育

    フィンガープリント

    「Parallel speech corpora of Japanese dialects」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

    引用スタイル