This paper investigates a lipreading scheme employing optical and depth modalities, with using deep bottleneck features. Optical and depth data are captured by Microsoft Kinect v2, followed by computing an appearance-based feature set in each modality. A basic feature set is then converted into a deep bottleneck feature using a deep neural network having a bottleneck layer. Multi-stream hidden Marcov models are used for recognition. We evaluated the method using our connected-digit corpus, comparing to our previous method. It is finally found that we could improve lipreading performance by employing deep bottleneck features.
|出版ステータス||Published - 2017|
|イベント||14th International Conference on Auditory-Visual Speech Processing, AVSP 2017 - Stockholm, Sweden|
継続期間: 2017 8月 25 → 2017 8月 26
|Conference||14th International Conference on Auditory-Visual Speech Processing, AVSP 2017|
|Period||17/8/25 → 17/8/26|
ASJC Scopus subject areas