Affordance theory suggests that humans recognize the environment based on invariants. Invariants are features that describe the environment offering behavioral information to humans. Two types of invariants exist, structural invariants and transformational invariants. In our previous paper, we developed a method that self- organizes transformational invariants, or motion features, from camera images based on robot's experiences. The model used a bi-directional technique combining a recurrent neural network for dynamics learning, namely Recurrent Neural Network with Parametric Bias (RNNPB), and a hierarchical neural network for feature extraction. The bi-directional training method developed in the previous work was effective in clustering the motion of objects, but the analysis did not give good segregation results of the self-organized features (transformational invariants) among different motion types. In this paper, we present a refined model which integrates dynamics learning and feature extraction in a single model. The refined model is comprised of Multiple Timescales Recurrent Neural Network (MTRNN), which possesses better learning capability than RNNPB. Self-organization result of four types of motions have proved the model's capability to create clusters of object motions. The analysis showed that the model extracted feature sequences with different characteristics for four object motion types.