The ability to correctly perceive time and extract accurate timing information is crucial during social interaction. In fact, several activities during social interaction, such as appropriate feedback, turn-taking, coordination with peers, and even empathy and engagement exhibition directly depend on it. One of the aspects of cognitive malfunctioning in children with Autistic Spectrum Disorders is time perception deficit. Learning to pay attention to and correctly assess timing is thus a critical first step to improve social skills for children with Autism. In this paper, we present a novel sensing system and algorithm for estimating a subject's rhythmic motion timing from visual information using Recurrent Neural Network (RNN) coupled with FFT. This system will enable a robot saxophonist to estimate the rhythmic period from a child's motion during a robot-based music therapy session. Fast-Fourier- Transform (FFT) is an algorithm widely applied in rhythmic body movement detection, due to advantages such low computation and easy integration. However, long transient time delay is a critical limitation, reducing the correct motion timing estimation during period transitions. The novel system presented in this article is shown to significantly reduce transient time delay. The results of both a simulation and an evaluation experiment show that, compared with FFT processing alone, this algorithm gives a better performance due to its smaller average offset error and shorter transient time delay, allowing a more precise assessment of the child's synchronization response.