We propose a novel framework for testing configurable cyber-physical systems over a given specification represented as metric temporal logic formula. Given a system model with configurable properties and a specification, our approach first learns to falsify the model by using reinforcement learning technique under a certain variety of configurations. After the training phase, it is expected that the experienced falsification agent can quickly find an input signal such that the output violates the specification, even though the specific configuration is not known to the agent. Thus we can use this agent again and again when different configurations are investigated for a product family or for trials and errors of configuration design. We performed a preliminary experiment to validate our hypothesis that the reinforcement learning technique can be applied for falsification problems.