This paper presents VRMixer, a system that mixes real world and a video clip letting a user enter the video clip and realize a virtual co-starring role with people appearing in the clip. Our system constructs a simple virtual space by allocating video frames and the people appearing in the clip within the user's 3D space. By measuring the user's 3D depth in real time, the time space of the video clip and the user's 3D space become mixed. VRMixer automatically extracts human images from a video clip by using a video segmentation technique based on 3D graph cut segmentation that employs face detection to detach the human area from the background. A virtual 3D space (i.e., 2.5D space) is constructed by positioning the background in the back and the people in the front. In the video clip, the user can stand in front of or behind the people by using a depth camera. Real objects that are closer than the distance of the clip's background will become part of the constructed virtual 3D space. This synthesis creates a new image in which the user appears to be a part of the video clip, or in which people in the clip appear to enter the real world. We aim to realize "video reality," i.e., a mixture of reality and video clips using VRMixer.