The predictive processing (PP) hypothesizes that the predictive inference of our sensorimotor system is encoded implicitly in the regularities between perception and action. We propose a neural architecture in which such regularities of active inference are encoded hierarchically. We further suggest that this encoding emerges during the embodied learning process when the appropriate action is selected to minimize the prediction error in perception. Therefore, this predictive stream in the sensorimotor loop is generated in a top-down manner. Specifically, it is constantly modulated by the motor actions and is updated by the bottom-up prediction error signals. In this way, the top-down prediction originally comes from the prior experience from both perception and action representing the higher levels of this hierarchical cognition. In our proposed embodied model, we extend the PredNet Network, a hierarchical predictive coding network, with the motor action units implemented by a multi-layer perceptron network (MLP) to modulate the network top-down prediction. Two experiments, a minimalistic world experiment, and a mobile robot experiment are conducted to evaluate the proposed model in a qualitative way. In the neural representation, it can be observed that the causal inference of predictive percept from motor actions can be also observed while the agent is interacting with the environment.