We are engaged in research on computational auditory scene analysis to attain sophisticated robot (computer) human interaction by recognizing auditory awareness. The objective of our research is the understanding of an arbitrary sound mixture including non-speech sounds and music as well as voiced speech, obtained by robot's ears (or microphones embedded in the robot). The main issues are sound source localization, separation, and recognition at signal processing levels, and signal-to-symbol transformation at the interface level to symbol processing levels. The latter is critical in developmental communication and we are developing an automatic onomatopoeia recognition system. This paper overviews our activities in robot audition, in particular, active direction-pass filter (ADPF) that separates sounds originating from a specific direction by integrating sound source localization and visual processing. ADPF is implemented on three kinds of robots and demonstrates separating and recognizing three simultaneous speeches with a pair of microphones.