We previously proposed a method of acoustic scene classification using a deep neural network-Gaussian mixture model (DNN-GMM) and frame-concatenated acoustic features. It was submitted to the Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Challenge and was ranked eighth among 49 algorithms. In the proposed method, acoustic features in temporally distant frames were concatenated to capture their temporal relationship. The experimental results indicated that the classification accuracy is improved by increasing the number of concatenated frames. On the other hand, the frame concatenation interval, which is the interval with which the frames used for frame concatenation are selected, is another important parameter. In our previous method, the frame concatenation interval was fixed to 100 ms. In this paper, we optimize the number of concatenated frames and the frame concatenation interval for the previously proposed method. As a result, it was confirmed that the classification accuracy of the method was improved by 2.61% in comparison with the result submitted to the DCASE 2016.