TY - JOUR
T1 - Mirrornet
T2 - A deep reflective approach to 2d pose estimation for single-person images
AU - Nakatsuka, Takayuki
AU - Yoshii, Kazuyoshi
AU - Koyama, Yuki
AU - Fukayama, Satoru
AU - Goto, Masataka
AU - Morishima, Shigeo
N1 - Funding Information:
Acknowledgments We are thankful for AI Bridging Cloud Infrastructure (ABCI) of National Institute of Advanced Industrial Science and Technology (AIST), which we used extensively for our experiments. This work was partly supported by the Program for Leading Graduate Schools, “Graduate Program for Embodiment Informatics” of the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan, JST ACCEL No.JPMJAC1602, JSPS KAKENHI No.19H04137, and JST-Mirai Program No.JPMJMI19B2.
Funding Information:
We are thankful for AI Bridging Cloud Infrastructure (ABCI) of National Institute of Advanced Industrial Science and Technology (AIST), which we used extensively for our experiments. This work was partly supported by the Program for Leading Graduate Schools, “Graduate Program for Embodiment Informatics” of the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan, JST ACCEL No.JPMJAC1602, JSPS KAKENHI No.19H04137, and JST-Mirai Program No.JPMJMI19B2.
Publisher Copyright:
© 2021 Information Processing Society of Japan.
PY - 2021/5
Y1 - 2021/5
N2 - This paper proposes a statistical approach to 2D pose estimation from human images. The main problems with the standard supervised approach, which is based on a deep recognition (image-to-pose) model, are that it often yields anatomically implausible poses, and its performance is limited by the amount of paired data. To solve these problems, we propose a semi-supervised method that can make effective use of images with and without pose annotations. Specifically, we formulate a hierarchical generative model of poses and images by integrating a deep generative model of poses from pose features with that of images from poses and image features. We then introduce a deep recognition model that infers poses from images. Given images as observed data, these models can be trained jointly in a hierarchical variational autoencoding (image-to-pose-to-feature-to-pose-to-image) manner. The results of experiments show that the proposed reflective architecture makes estimated poses anatomically plausible, and the pose estimation performance is improved by integrating the recognition and generative models and also by feeding non-annotated images.
AB - This paper proposes a statistical approach to 2D pose estimation from human images. The main problems with the standard supervised approach, which is based on a deep recognition (image-to-pose) model, are that it often yields anatomically implausible poses, and its performance is limited by the amount of paired data. To solve these problems, we propose a semi-supervised method that can make effective use of images with and without pose annotations. Specifically, we formulate a hierarchical generative model of poses and images by integrating a deep generative model of poses from pose features with that of images from poses and image features. We then introduce a deep recognition model that infers poses from images. Given images as observed data, these models can be trained jointly in a hierarchical variational autoencoding (image-to-pose-to-feature-to-pose-to-image) manner. The results of experiments show that the proposed reflective architecture makes estimated poses anatomically plausible, and the pose estimation performance is improved by integrating the recognition and generative models and also by feeding non-annotated images.
KW - 2D pose estimation
KW - Amortized variational inference
KW - Mirror system
KW - Variational autoencoder
UR - http://www.scopus.com/inward/record.url?scp=85107181535&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85107181535&partnerID=8YFLogxK
U2 - 10.2197/IPSJJIP.29.406
DO - 10.2197/IPSJJIP.29.406
M3 - Article
AN - SCOPUS:85107181535
VL - 29
SP - 406
EP - 423
JO - Journal of Information Processing
JF - Journal of Information Processing
SN - 0387-5806
ER -