In computer vision, pose estimation system is widely used to construct human body transformation. However, it is hard to achieve these targets together: Stable real-time speed, variance human number and high accuracy. This paper proposes an end-to-end pose estimation network. It contains a neural network friendly representation of human pose. Then it proposes a correspond real-time end-to-end pose estimation network based on feature pyramid network structure with attention-based detection modules. This network can detect multiple humans in more than 60 fps with 384 x 384 resolution on GTX 1070 with affordable accuracy. This work shows the potential of this network structure can perform both faster and better compared with state-of-the-art results.