This paper reports our recent efforts to develop robust speech recognition in cars. Speech recognition is expected to handle many devices on cars. However, many kinds of acoustic noises, e.g. engine noise and car stereo, are observed in in-car environments, making performance of speech recognition decrease. In order to overcome the degradation, we develop a high-performance audio-visual speech recognition method. Lip images are obtained from captured face images using our face detection scheme. Some basic visual features are computed, then converted into visual features for speech recognition using a deep neural network. Audio features are obtained as well. Audio and visual features are subsequently concatenated into audio-visual features. As a recognition model, a multi-stream hidden Markov model is employed which can adjust contributions of audio and visual modalities. We evaluated our proposed method using an audio-visual corpus CENSREC-1-AV. In order to simulate driving-car condition, we prepared driving and music noises. Experimental results show that our method can significantly improving recognition performance in in-car condition.