This paper presents a fusion of monocular camera-based metric localization, IMU and odometry in dynamic environments of public roads. We build multiple vision-based maps and use them at the same time in localization phase. For the mapping phase, visual maps are built by employing ORB-SLAM and accurate metric positioning from LiDAR-based NDT scan matching. This external positioning is utilized to correct for scale drift inherent in all vision-based SLAM methods. Next in the localization phase, these embedded positions are used to estimate the vehicle pose in metric global coordinates using solely monocular camera. Furthermore, to increase system robustness we also proposed utilization of multiple maps and sensor fusion with odometry and IMU using particle filter method. Experimental testing were performed through public road environment as far as 170 km at different times of day to evaluate and compare localization results of vision-only, GNSS and sensor fusion methods. The results show that sensor fusion method offers lower average errors than GNSS and better coverage than vision-only one.