Long short-term memory (LSTM) recurrent neural network based language models are known to improve speech recognition performance. However, significant effort is required to optimize network structures and training configurations. In this study, we automate the development process using evolutionary algorithms. In particular, we apply the covariance matrix adaptation-evolution strategy (CMA-ES), which has demonstrated robustness in other black box hyper-parameter optimization problems. By flexibly allowing optimization of various meta-parameters including layer wise unit types, our method automatically finds a configuration that gives improved recognition performance. Further, by using a Pareto based multi-objective CMA-ES, both WER and computational time were reduced jointly: after 10 generations, relative WER and computational time reductions for decoding were 4.1% and 22.7% respectively, compared to an initial baseline system whose WER was 8.7%.