For robots to have a wide range of applications, they must be able to execute numerous tasks. However, recent studies into robot manipulation using deep neural networks (DNN) have primarily focused on single tasks. Therefore, we investigate a robot manipulation model that uses DNNs and can execute long sequential dynamic tasks by performing multiple short sequential tasks at appropriate times. To generate compound tasks, we propose a model comprising two DNNs: a convolutional autoencoder that extracts image features and a multiple timescale recurrent neural network (MTRNN) to generate motion. The internal state of the MTRNN is constrained to have similar values at the initial and final motion steps; thus, motions can be differentiated based on the initial image input. As an example compound task, we demonstrate that the robot can generate a 'Put-In-Box' task that is divided into three subtasks: open the box, grasp the object and put it into the box, and close the box. The subtasks were trained as discrete tasks, and the connections between each subtask were not trained. With the proposed model, the robot could perform the Put-In-Box task by switching among subtasks and could skip or repeat subtasks depending on the situation.