A new graph-based evolutionary algorithm called Genetic Network Programming (GNP) has been proposed. The solutions of GNP are represented as graph structures, which can improve the expression ability and performance. In addition, GNP with Reinforcement Learning (GNP-RL) has been proposed to search for solutions efficiently. GNP-RL can use current information and change its programs during task execution, i.e., online learning. Thus, it has an advantage over evolution-based algorithms in case much information can be obtained during task execution. GNP-RL has a special stateaction space and it contributes to reducing the size of the Q-table and learning efficiently. The proposed method is applied to the controller of Khepera simulator and its performance is evaluated.