The goal of our work is to create natural verbal interaction between humans and speech dialogue agents. In this paper, we focus on generations of back-channel for speech dialogue agents the same way humans do. To create such a system, the system needs to predict the appropriate timing of back-channel on the basis of the human's speech. For the prediction model, we use a neuro-dynamical system called a multiple timescale recurrent neural network (MTRNN). The model is trained using an actual corpus of a poster session of the IMADE project using the presenter's prosodic and visual information as features. Using the model, we conducted back-channel timing prediction experiments. The results showed that our system could predict back-channel timing about 0.5 seconds before generation of back-channel response. Comparing the results with the actual back-channel timing in the corpus, the system showed 37.1% of recall, 31.7% of precision, and 34.2% of F-measure. These results show the model to effectively predict and generate back-channel responses.