This paper presents the studies related to multimodal interaction systems. It also describes our new direction in the research, `Intermodal Learning'. The prototype system has four modes: vision, graphical display, speech recognition, and speech synthesis sub-systems, and an interaction manager. We demonstrated that it can learn user's face and name and the appearance and names of objects. A speech recognition technique to estimate phonetic transcriptions from multiple speech samples was used to learn new words. This is similar to a baby learning about the real world by communicating with its parents.
|ジャーナル||Denshi Gijutsu Sogo Kenkyusho Iho/Bulletin of the Electrotechnical Laboratory|
|出版ステータス||Published - 2000|
ASJC Scopus subject areas