Deep neural networks (DNNs) are constructed by considering highly complicated configurations including network structure and several tuning parameters (number of hidden states and learning rate in each layer), which greatly affect the performance of speech processing applications. To reach optimal performance in such systems, deep understanding and expertise in DNNs is necessary, which limits the development of DNN systems to skilled experts. To overcome the problem, this paper proposes an efficient optimization strategy for DNN structure and parameters using evolutionary algorithms. The proposed approach parametrizes the DNN structure by a directed acyclic graph, and the DNN structure is represented by a simple binary vector. Genetic algorithm and covariance matrix adaptation evolution strategy efficiently optimize the performance jointly with respect to the above binary vector and the other tuning parameters. Experiments on phoneme recognition and spoken digit detection tasks show the effectiveness of the proposed approach by discovering the appropriate DNN structure automatically.