Audio-Oriented Video Interpolation Using Key Pose

Takayuki Nakatsuka*, Yukitaka Tsuchiya, Masatoshi Hamanaka, Shigeo Morishima

*この研究の対応する著者

研究成果: Article査読

抄録

This paper describes a deep learning-based method for long-term video interpolation that generates intermediate frames between two music performance videos of a person playing a specific instrument. Recent advances in deep learning techniques have successfully generated realistic images with high-fidelity and high-resolution in short-term video interpolation. However, there is still room for improvement in long-term video interpolation due to lack of resolution and temporal consistency of the generated video. Particularly in music performance videos, the music and human performance motion need to be synchronized. We solved these problems by using human poses and music features essential for music performance in long-term video interpolation. By closely matching human poses with music and videos, it is possible to generate intermediate frames that synchronize with the music. Specifically, we obtain the human poses of the last frame of the first video and the first frame of the second video in the performance videos to be interpolated as key poses. Then, our encoder-decoder network estimates the human poses in the intermediate frames from the obtained key poses, with the music features as the condition. In order to construct an end-to-end network, we utilize a differentiable network that transforms the estimated human poses in vector form into the human pose in image form, such as human stick figures. Finally, a video-to-video synthesis network uses the stick figures to generate intermediate frames between two music performance videos. We found that the generated performance videos were of higher quality than the baseline method through quantitative experiments.

本文言語English
論文番号2160016
ジャーナルInternational Journal of Pattern Recognition and Artificial Intelligence
35
16
DOI
出版ステータスPublished - 2021 12月 30

ASJC Scopus subject areas

  • ソフトウェア
  • コンピュータ ビジョンおよびパターン認識
  • 人工知能

フィンガープリント

「Audio-Oriented Video Interpolation Using Key Pose」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル