3D human pose estimation has many important applications in human-computer interaction and human action recognition. Simultaneously achieving real-time speed, varying human number, and high accuracy from a single RGB image is a challenging problem. To this end, this paper proposes a multi-task and multi-level neural network structure with physical constraint. The unique network structure estimates 3D human poses from single RGB image in an end-to-end way and achieves both high accuracy and high speed. Experimental results shows that the proposed system achieves 21 fps on RTX 2080 GPU with only 33 mm accuracy loss compared with conventional works. The mechanism of the network is also analyzed through network visualization. This work shows the possibility of estimating 3D human pose from a single RGB monocular camera with real-time speed.
ASJC Scopus subject areas
- コンピュータ ネットワークおよび通信