High frame rate, ultra-low delay yet accurate hand tracking system provides a seamless and intuitive interface for Human Computer Interaction (HCI). Tracking multi-person's dual-hand from monocular RGB camera is challenging for hand's variant image feature. Although many CNN based trackers have been proposed on general hardware, they cannot address this challenge with ultra-high speed. This paper proposes: (A) Hetero complementary networks for ultra-high speed dual-hand tracking, where the quick primary result from an FPGA network is intermittently combined with delayed accurate result from a GPU network. (B) Hard-wired condensing binarization for ultrahigh speed network implementation on FPGA. The network is able to be directly mapped as hardware resource because complex computation is condensed into binary layers. The proposed method achieves 69.8% accuracy on test sequences, which is only 4.7% lower compared with the general method. Meanwhile, the estimated FPGA resource utilization is tremendously reduced to 54.7% on the target platform. This work shows the potential to track multi-person's dual-hand at millisecond-level speed.