Prosodic Characteristics of Japanese Conversational Speech

Nobuyoshi Kaiki, Yoshinori Sagisaka

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

In this paper, we quantitatively analyzed speech data in seven different styles to make natural Japanese conversational speech synthesis. Three reading styles were produced at different speeds (slow, normal and fast), and four speaking styles were produced by enacting conversation in different situations (free, hurried, angry and polite). To clarify the differences in prosodic characteristics between conversational speech and read speech, means and standard deviations of vowel duration, vowel amplitude and fundamental frequency (Fo) were analyzed. We found large variation in these prosodic parameters. To look more precisely at the segmental duration and segmental amplitude differences between conversational speech and read speech, control rules of prosodic parameters in reading styles were applied to conversational speech. Fo contours of different speaking styles are superposed by normalizing the segmental duration. The differences between estimated values and actual values were analyzed. Large differences were found at sentence final and key (focused) phrases. Sentence final positions showed lengthening of segmental vowel duration and increased segmental vowel amplitude. Key phrase positions featured raising Fo.

Original languageEnglish
Pages (from-to)1927-1933
Number of pages7
JournalIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
VolumeE76-A
Issue number11
Publication statusPublished - 1993 Nov
Externally publishedYes

Fingerprint

Speech Synthesis
Speech synthesis
Mean deviation
Fundamental Frequency
Standard deviation
Speech
Style

ASJC Scopus subject areas

  • Hardware and Architecture
  • Information Systems
  • Electrical and Electronic Engineering

Cite this

Prosodic Characteristics of Japanese Conversational Speech. / Kaiki, Nobuyoshi; Sagisaka, Yoshinori.

In: IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol. E76-A, No. 11, 11.1993, p. 1927-1933.

Research output: Contribution to journalArticle

@article{f6b1120ab4a346eda5fada8a831e58c8,
title = "Prosodic Characteristics of Japanese Conversational Speech",
abstract = "In this paper, we quantitatively analyzed speech data in seven different styles to make natural Japanese conversational speech synthesis. Three reading styles were produced at different speeds (slow, normal and fast), and four speaking styles were produced by enacting conversation in different situations (free, hurried, angry and polite). To clarify the differences in prosodic characteristics between conversational speech and read speech, means and standard deviations of vowel duration, vowel amplitude and fundamental frequency (Fo) were analyzed. We found large variation in these prosodic parameters. To look more precisely at the segmental duration and segmental amplitude differences between conversational speech and read speech, control rules of prosodic parameters in reading styles were applied to conversational speech. Fo contours of different speaking styles are superposed by normalizing the segmental duration. The differences between estimated values and actual values were analyzed. Large differences were found at sentence final and key (focused) phrases. Sentence final positions showed lengthening of segmental vowel duration and increased segmental vowel amplitude. Key phrase positions featured raising Fo.",
author = "Nobuyoshi Kaiki and Yoshinori Sagisaka",
year = "1993",
month = "11",
language = "English",
volume = "E76-A",
pages = "1927--1933",
journal = "IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences",
issn = "0916-8508",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "11",

}

TY - JOUR

T1 - Prosodic Characteristics of Japanese Conversational Speech

AU - Kaiki, Nobuyoshi

AU - Sagisaka, Yoshinori

PY - 1993/11

Y1 - 1993/11

N2 - In this paper, we quantitatively analyzed speech data in seven different styles to make natural Japanese conversational speech synthesis. Three reading styles were produced at different speeds (slow, normal and fast), and four speaking styles were produced by enacting conversation in different situations (free, hurried, angry and polite). To clarify the differences in prosodic characteristics between conversational speech and read speech, means and standard deviations of vowel duration, vowel amplitude and fundamental frequency (Fo) were analyzed. We found large variation in these prosodic parameters. To look more precisely at the segmental duration and segmental amplitude differences between conversational speech and read speech, control rules of prosodic parameters in reading styles were applied to conversational speech. Fo contours of different speaking styles are superposed by normalizing the segmental duration. The differences between estimated values and actual values were analyzed. Large differences were found at sentence final and key (focused) phrases. Sentence final positions showed lengthening of segmental vowel duration and increased segmental vowel amplitude. Key phrase positions featured raising Fo.

AB - In this paper, we quantitatively analyzed speech data in seven different styles to make natural Japanese conversational speech synthesis. Three reading styles were produced at different speeds (slow, normal and fast), and four speaking styles were produced by enacting conversation in different situations (free, hurried, angry and polite). To clarify the differences in prosodic characteristics between conversational speech and read speech, means and standard deviations of vowel duration, vowel amplitude and fundamental frequency (Fo) were analyzed. We found large variation in these prosodic parameters. To look more precisely at the segmental duration and segmental amplitude differences between conversational speech and read speech, control rules of prosodic parameters in reading styles were applied to conversational speech. Fo contours of different speaking styles are superposed by normalizing the segmental duration. The differences between estimated values and actual values were analyzed. Large differences were found at sentence final and key (focused) phrases. Sentence final positions showed lengthening of segmental vowel duration and increased segmental vowel amplitude. Key phrase positions featured raising Fo.

UR - http://www.scopus.com/inward/record.url?scp=0027698731&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0027698731&partnerID=8YFLogxK

M3 - Article

VL - E76-A

SP - 1927

EP - 1933

JO - IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

JF - IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

SN - 0916-8508

IS - 11

ER -