High dynamic range (HDR) image reconstruction of dynamic scenes from several low dynamic range (LDR) images captured with different exposure time is a challenging problem. Although several methods based on optical flow or patch-match have been proposed to address this problem, they are not robust enough and results still suffer from ghost-like artifacts for challenging scenes where large foreground motions exist. To this end, this paper proposes a multi-scale contextual attention guided alignment network called CAHDRNet and presents some evaluation results. In stark contrast to methods based on optical flow, this paper demonstrates that HDR reconstruction can be formulated as an image inpainting task and CAHDRNet conducts patch replacement on the deep feature maps unlike previous patch-based reconstruction methods. The contextual attention module proposed by an image inpainting work is extended in CAHDRNet with multi-scale attention rebalance to help model flexibly handle different scenes and reduce patch replacement error. Experiments on the public dataset indicate that the proposed CAHDRNet produces ghost-free results where detail in ill-exposed areas is well recovered. The proposed method scores 40.97 on test sequences with PSNR metric averagely while the best PSNR score of non-flow-based methods is 38.60. The flow-based method scores 40.95 with PSNR metric while it has 5 points better score than our result with HDR-VDP-2 metric. According to quantitative and qualitative evaluations, the proposed method outperforms all non-flow-based methods and has its merits and demerits compared with the flow-based method.