Robots with flexible joints have recently been attracting attention from researchers because such robots can passively adapt to environmental changes and realize dynamic motion that uses inertia. In previous research, body-model acquisition using deep learning was proposed and dynamic motion learning was achieved. However, using the end-effector position as a visual feedback signal to train a robot limits what the robot can know to only the relation between the task and itself, instead of the relation between the environment and itself. In this research, we propose to use images as a feedback signal so that the robot can have a sense of the overall situation within the task environment. This motion learning is performed via deep learning using raw image data. In an experiment, we let a robot perform task motions once to acquire motor and image data. Then, we used a convolutional auto-encoder to extract image features from raw image data. The extracted image features were used in combination with motor data to train a recurrent neural network. As a result, motion learning through deep learning from image data allowed the robot to acquire environmental information and conduct tasks that require consideration of environmental changes, making use of its advantage of passive adaptation.