TY - GEN
T1 - F0 analysis for Japanese conversational speech synthesis
AU - Nakajima, Hideharu
AU - Sagisaka, Yoshinori
PY - 2009
Y1 - 2009
N2 - This paper proposes a conversational style text-to-speech synthesis scheme based on an analysis of fundamental frequency, F0. Through the analysis, we confirm that conversational F0 can be represented by the superpositional model using three components ranging utterance, major phrase, and minor phrase. We compare each component of the model between conversational style and reading style to investigate the following points: where big F0 discrepancies are found, what linguistic factors concern to the discrepancies, and to what extent do such discrepancies occur. This paper uses real domain data that includes a lot of linguistic context. Analysis confirms that large differences occur in global components such as single span whole utterances and phrases, and that the differences occur at or around domain-specific expressions. The analysis also reveals that local components are almost the same in both styles. These analyses show that it is necessary to estimate the utterance and phrase components from words attributes other than the grammatical clues to realize conversational synthesis in the super positional manner.
AB - This paper proposes a conversational style text-to-speech synthesis scheme based on an analysis of fundamental frequency, F0. Through the analysis, we confirm that conversational F0 can be represented by the superpositional model using three components ranging utterance, major phrase, and minor phrase. We compare each component of the model between conversational style and reading style to investigate the following points: where big F0 discrepancies are found, what linguistic factors concern to the discrepancies, and to what extent do such discrepancies occur. This paper uses real domain data that includes a lot of linguistic context. Analysis confirms that large differences occur in global components such as single span whole utterances and phrases, and that the differences occur at or around domain-specific expressions. The analysis also reveals that local components are almost the same in both styles. These analyses show that it is necessary to estimate the utterance and phrase components from words attributes other than the grammatical clues to realize conversational synthesis in the super positional manner.
UR - http://www.scopus.com/inward/record.url?scp=72449133264&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=72449133264&partnerID=8YFLogxK
U2 - 10.1109/SNLP.2009.5340932
DO - 10.1109/SNLP.2009.5340932
M3 - Conference contribution
AN - SCOPUS:72449133264
SN - 9781424441389
T3 - 2009 8th International Symposium on Natural Language Processing, SNLP '09
SP - 137
EP - 142
BT - 2009 8th International Symposium on Natural Language Processing, SNLP '09
T2 - 2009 8th International Symposium on Natural Language Processing, SNLP '09
Y2 - 20 October 2009 through 22 October 2009
ER -