In this research, we focus on finding a new method of human-robot interaction in industrial environment. A vision-based dynamic hand gestures recognition system has been proposed for robot arm picking task. 8 dynamic hand gestures are captured for this task with a 100fps high speed camera. Based on the LRCN model, we combine the MobileNets (V2) and LSTM for this task, the MobileNets (V2) for extracting the image features and recognize the gestures, then, Long Short-Term Memory (LSTM) architecture for interpreting the features across time steps. Around 100 samples are taken for each gesture for training at first, then, the samples are augmented to 200 samples per gesture by data augmentation. Result shows that the model is able to learn the gestures varying in duration and complexity and gestures can be recognized in 88ms with 90.62% accuracy in the experiment on our hand gesture dataset.