Human gesture recognition systems are natural to use for achieving intelligent Human Computer Interaction (HCI). These systems should memorize the specific user and enable the user to gesture naturally without wearing special devices. Extracting different components of visual actions from human gestures, such as shape and motion of hand- facial expression and torso, is the key tasks in gesture recognition. So far, in the field of gesture n-cognition. most of the previous work have focused only on hand motion features and required the user lo wear special devices. In this paper, we present, an appearance-based multimodal gesture recognition framework, which combines the different modalities of features such as face identity, facial expression and hand motions which have been extracted from the image frames captured directly by a web camera. We refer 12 classes of human gestures with facial expression including neutral, negative (e.g. "angry") and positive (e.g. "excited") meanings from American Sign Language. A condensation-based algorithm is adopted for classification. We collected a data set with three recording sessions and conducted experiments with different combination techniques. Experimental results showed that the performance of hand gesture recognition is improved by adding facial analysis.