Musicians often have the following problem: they have a music score that requires 2 or more players, but they have no one with whom to practice. So far, score-playing music robots exist, but they lack adaptive abilities to synchronize with fellow players' tempo variations. In other words, if the human speeds up their play, the robot should also increase its speed. However, computer accompaniment systems allow exactly this kind of adaptive ability. We present a first step towards giving these accompaniment abilities to a music robot. We introduce a new paradigm of beat tracking using 2 types of sensory input - visual and audio - using our own visual cue recognition system and state-of-the-art acoustic onset detection techniques. Preliminary experiments suggest that by coupling these two modalities, a robot accompanist can start and stop a performance in synchrony with a flutist, and detect tempo changes within half a second.