Computer-generated speech animations are commonly seen in video games and movies. Although high-quality facial motions can be created by the hand crafted work of skilled artists, this approach is not always suitable because of time and cost constraints. A data-driven approach [Taylor et al. 2012], such as machine learning to concatenate video portions of speech training data, has been utilized to generate natural speech animation, while a large number of target shapes are often required for synthesis. We can obtain smooth mouth motions from prepared lip shapes for typical vowels by using an interpolation of lip shapes with Gaussian mixture models (GMMs) [Yano et al. 2007]. However, the resulting animation is not directly generated from the measured lip motions of someone's actual speech.