A human has various sensory perceptions, and effectively uses them in communication. Auditory and visual functions especially play an important role for recognizing someone to talk to and understanding the conversation. In vocal communication, we are able to detect the position of a source sound in 3D space, extract a particular sound from mixed sounds, and recognize who is talking. In addition, we are able to detect a particular person by recognizing body features and individual gestures. By realizing this mechanism using a computer, new applications will be presented, which are utilized in the flexible and intuitive communication with humans. The authors are working for the identification of a particular person using microphones and a USB camera. The paper describes the development of an information fusion system and how to deal with multiple data obtained by different sensors.