This paper proposed a temporal scalable decoding process with frame rate conversion method for surveillance video. This method can be used to reduce the computational complexity in the decoding process and keep the video quality at the same time, and make the single layer bit stream sources much more flexible for various terminal devices. It is realized based on frame-skipping conception with the proposed reference frame index decision algorithm, motion vector composition algorithm and block-partition mode decision algorithm. Compare with the frame rate conversion in transcoding process, it is much lower complexity and more flexible. Through the experimental results, the reduction of computational complexity (decoding time) depends on the number of skipped frames, the more frames was skipped the more reduction of the computational complexity will be got. The PSNR loss is very small (about 0.1 ∼ 0.2 (dB)) for B frame skipping. And the PSNR loss is about 0.7 ∼ 2 (dB) (the loss of SSIM is only 0.002 ∼ 0.007) for 2/3 P frame skipping and reduce the computational complexity about 60%.